This section specifically focuses on the incineration process of industrial waste
and designs an IWI treatment monitoring system based on CVT. The system adopts image
processing technology and deep learning algorithms for real-time monitoring and analysis
of the incineration process, aiming to improve the automation and intelligence level
of IWI processing.
2.1. Design of Monitoring System for Waste Incineration Treatment
With the continuous growth of China’s comprehensive national strength and the rapid
development of industrial modernization, people’s living standards have gradually
improved. At the same time, with the increase of consumption level, the amount of
household waste generated continues to rise, and the pressure on the environment from
cities has also increased. CVT can perform object detection tasks, that is, accurately
locate and classify objects in images or image sequences. Therefore, research is being
conducted on the development and design of monitoring software for IWI treatment using
CVT. The monitoring system uses high-temperature resistant YT-SGFL optical image acquisition
equipment as the core monitoring equipment. This device has excellent high-temperature
resistance and can directly penetrate the camera lens into the high-temperature and
high-pressure incinerator, capturing key information such as combustion conditions
and flame shape inside the furnace [12]. This not only greatly improves production safety but also adapts to high-temperature
working environments, providing strong support for the automation process in industrial
production. The diagram of the YT-SGFL monitoring equipment is shown in Fig. 1.
Fig. 1. Schematic diagram of YT-SGFL type monitoring equipment.
In Fig. 1, the monitoring device has carefully designed exhaust vents and lens cooling covers
in front of the camera. By injecting cooler compressed air or nitrogen into the probe
housing, the camera and lens can be continuously purged. This design can effectively
prevent particles or dust generated by combustion in the furnace from contaminating
the camera, thereby affecting image quality. On the other hand, this blowing method
can also cool the front of the camera, preventing thermal corrosion problems caused
by prolonged exposure to high temperatures. The industrial camera equipped with this
device has a high resolution of 2 million pixels, ensuring the clarity and stability
of the images. More notably, its signal-to-noise ratio is as high as 50dB, which means
it can capture low-noise and high contrast images under various working conditions.
The external material is made of high-quality stainless steel, which enables it to
work stably within a wide temperature range of -10 °C to 50 °C and adapt to various
environmental conditions. The industrial waste combustion detection system based on
CVT is shown in Fig. 2.
Fig. 2. Industrial waste combustion detection system based on computer vision technology.
In Fig. 2, the flame image inside the incinerator is captured by optical equipment and sent
to a video splitter. On the one hand, it serves as a real-time monitoring video for
staff to observe, and on the other hand, it is stored in the form of a signal to the
computer. Internally, images are processed through computer vision and image processing
techniques to infer the combustion state and evaluate it. The processed information
is transmitted to the control center, where the staff can understand the condition
of the combustion furnace and provide suggestions for decision-making. In flame image
processing, computer vision first enhances image visibility, eliminates interference,
and defines flame boundaries through preprocessing and cleaning. Considering the limitations
of image acquisition and the presence of noise in the image, image filtering is adopted
to improve image quality. By modifying or enhancing the image, highlighting features
or removing unnecessary parts, the image quality and the applicability of subsequent
processing analysis are effectively improved. The mathematical expression for image
median filtering is shown in Eq. (1).
In Eq. (1), $I$ and $I'$ represent the original image and the filtered image, respectively.
Both $i$ and $j$ are the pixel coordinates of the image. $m$ and $n$ are the size
of the filtering window. $K$ represents the offset of pixels in the window. Image
denoising is an important step in digital image processing, aimed at reducing image
noise. This study uses adaptive filtering for denoising. This algorithm optimizes
the filtering effect based on local variance and has a good effect on eliminating
Gaussian noise. The minimum mean square error of adaptive filtering is shown in Eq.
(2).
In Eq. (2), $\omega(n)$ represents the weight vector of the filter at time $n$. $\mu$ represents
the step size factor. $e(n)$ and $x(n)$ represent the error signal and input signal
at time $n$, respectively. Image edge detection (IED) is a key feature in image recognition,
which identifies points in the image with significant changes in brightness and is
an important basis for image segmentation. The Canny operator is a commonly used IED
algorithm proposed by John F. Canny in 1986, which exhibits excellent performance
and is widely adopted [13]. Therefore, this study applies the Canny operator for IED. The workflow of the Canny
operator includes steps such as denoising and calculating gradient amplitude and direction.
Due to the extreme sensitivity of edge detection to noise, Gaussian filters are first
used to smooth the image and reduce noise. The mathematical expression for smoothing
is shown in Eq. (3).
In Eq. (3), $I''$ represents the smoothed image, and $G(i, j)$ represents the Gaussian function.
The calculation formula for gradient amplitude is shown in Eq. (4).
In Eq. (4), $M$ represents the amplitude of the gradient, $p$ represents the partial derivative,
and $(a,b)$ represents the coordinates of the function at a specific position in two-dimensional
space. The formula for gradient direction is shown in Eq. (5).
In Eq. (5), $\theta$ represents the direction of the gradient. The diagram of IED extraction
based on Canny operator is shown in Fig. 3.
Fig. 3. Schematic diagram of image edge detection and extraction based on Canny operator
Fig. 3 shows the results of Canny operator edge detection. This method identifies as many
true edges of the image as possible, and the identified edges should be as close as
possible to the actual image edges. The preprocessing steps of image filtering and
edge detection provide convenience for image cleaning, further improving the accuracy
of flame image recognition and detection. These steps provide necessary preparations
for subsequent image processing, optimizing the efficiency of the entire image recognition
and detection process. In computer vision technology, the application of image segmentation
and recognition algorithms can provide real-time status assessment and feedback for
the waste incineration process. By detecting the shape and position of the flame in
real-time, the system can adjust the incineration conditions in a timely manner and
optimize combustion efficiency. By combining image processing with deep learning,
the system can quantitatively evaluate the combustion situation during the incineration
process, such as calculating indicators such as the proportion of combustion area
and combustion intensity, and providing real-time feedback on the working status of
the incinerator, providing decision-making basis for operators. In addition, abnormal
combustion such as excessive flames, abnormal temperature, and incomplete combustion
during waste incineration may pose a threat to the environment and safety. Computer
vision systems can achieve early warning by analyzing abnormal phenomena in images.
2.2. IS-IR Algorithm based on CVT
In CVT, the study of IS-IR algorithm is crucial for understanding and parsing image
content. By accurately segmenting and recognizing images, key information in the image
can be effectively extracted. The advancement of this technology provides new possibilities
for monitoring systems for CVT processing. With the help of the IS-IR algorithm, the
monitoring system for waste incineration treatment can extract valuable information
from complex images, monitor and precisely control the incineration process in real-time,
thereby improving processing efficiency and safety. U-Net is a CNN architecture specifically
designed for image segmentation. Its name comes from its U-shaped architecture, which
includes a down-sampling and an up-sampling path, forming a U-shaped structure [14]. The structure of the U-Net image segmentation algorithm is shown in Fig. 4.
Fig. 4. The network structure of U-net image segmentation algorithm.
In Fig. 4, the U-Net algorithm mainly consists of a contraction path and an expansion path.
The contraction path is based on typical CNN, effectively obtaining image context
information and learning features. The extended path restores object boundaries through
upsampling, convolution, and skip connections, preserving detailed information [15]. In IWI processing monitoring, U-Net successfully achieves precise flame segmentation
and recognition. However, when faced with complex waste feature recognition tasks
such as fine shape, size, and state analysis, there may be limitations. Each version
of the YOLO model offers models of different sizes, such as nano, small, medium, and
hug. Although these models have a similar order in weight size and execution time,
with the update of versions, some new versions of YOLO can achieve a better balance
between accuracy and execution time while ensuring lower computational complexity.
For example, new versions such as YOLOv5 have improved accuracy and shortened execution
time while optimizing the network structure. Therefore, this study adopts the YOLOv5
recognition algorithm. YOLOv5 is known for its real-time performance and high accuracy
in object detection, which can quickly and accurately identify the position and category
of target objects [16]. Its focus on global information makes it perform better than U-Net in dealing with
object detection problems under different scales, rotations, and lighting conditions
[17]. The YOLOv5 network architecture is shown in Fig. 5.
Fig. 5. YOLOv5 network architecture.
In Fig. 5, when processing the object detection task, the YOLOv5 algorithm first extracts the
features of the input image through the backbone network. These features are then
fed into the neck network for deeper feature processing. Finally, the processed features
are input into the head network. In the head network, the algorithm determines the
target category and corrects the coordinates of candidate boxes based on position
offset, thereby obtaining more accurate object detection results [18]. On the basis of the traditional YOLOv5 flame detection framework, the study used
RGB, HSV, and Lab color spaces for feature extraction of flame colors. The RGB color
space provides intuitive color distribution information, the HSV space helps distinguish
brightness, saturation, and hue, while the Lab color space can more accurately capture
flame color changes under different lighting conditions. In order to capture the color
information of flames, this study uses color clustering to classify flame regions
and conducts fine-grained analysis of flame regions at different combustion stages.
The system can identify the trend of combustion state changes based on the captured
flame color information, and assist in optimizing the incineration process according
to the combustion state. The loss function expression of YOLOv5 algorithm is shown
in Eq. (6).
In Eq. (6), $L$ represents the loss function. $IoU$ represents intersection and union ratio.
$\alpha$ and $\beta$ respectively represent the Euclidean distance between the predicted
box and the true box. $\varepsilon$ represents adjustable parameters. $\gamma$ represents
auxiliary items. The definition of auxiliary items is shown in Eq. (7).
In Eq. (7), $h_t$ and $w_t$ represent the height and width of the actual box. $h_p$ and $w_p$
represent the height and width of the predicted box, respectively. $\pi$ represents
pi. The definition of adjustable parameters is shown in Eq. (8).
Artificial neural networks draw on the attention mechanism of the human brain to process
continuous inputs. Under this mechanism, neural networks can balance and focus on
various input data according to task requirements, thereby more accurately extracting
relevant information [19]. To enhance the performance of flame image recognition, this study innovates on the
basis of the original YOLOv5 architecture. This improvement aims to capture local
characteristics and contextual information of images more efficiently. Specifically,
by integrating the SimAM attention module into the original YOLOv5, the model’s ability
to identify flame image features has been further improved. As an advanced attention
mechanism module, the SimAM module is capable of conducting a comprehensive investigation
into the interrelationships between features, thereby enabling the model to prioritize
target features and enhance the precision of the prediction [20]. The energy function defined by the SimAM attention module is expressed as Eq. (9).
In Eq. (9), $e$ represents the energy function. $x_i$ represents the target neuron of the input
feature. $M$ represents the number of neurons. $c_t$ and $d_t$ are weight and deviation,
respectively. The final energy function after regularization is shown in Eq. (10).
In Eq. (10), there exists an energy function for each channel, which requires a large amount
of computation. Therefore, the closed form solutions for weight and deviation can
be calculated first. The average value of neurons $v_t$ is shown in Eq. (11).
The variance of neurons $\sigma^2_t$ is shown in Eq. (12).
The closed form solution of the weight is shown in Eq. (13).
The closed form solution of the deviation is shown in Eq. (14).
As shown in Eqs. (13) and (14), the closed form solutions for weights and biases can be obtained in a single channel,
so it is reasonable to assume that all pixels in that channel follow the same distribution.
Based on this assumption, it is possible to calculate the mean and variance of all
neurons and reuse all neurons on that channel. This approach can significantly reduce
computational costs and avoid iterative calculations of weights and biases for each
position. The minimum energy function is expressed as Eq. (15).
The architecture of the IWI treatment monitoring system based on computer vision is
shown in Fig. 6. In the system architecture, the research initially employs image median filtering
to accentuate salient features and delineate the flame boundary. Subsequently, the
Canny operator is utilized to perform edge computing, thereby extracting the image
that encapsulates the edge information pertaining to the flame particle trajectory.
Thereafter, the U-Net image segmentation algorithm is deployed to accurately segment
and recognize the flame. Finally, the YOLOv5 image recognition algorithm is employed
to identify the position and category of target objects. In the process of flame image
processing, the BGR three-channel flame image is initially converted into a single-channel
grayscale image. Subsequently, the time and date information displayed above the grayscale
image is removed. Then, the Canny operator is used for edge computing to get the edge
detection results. Finally, the obtained trajectory of flame particles is represented
by straight lines through line transformation. The specific steps are as follows:
Firstly, convert the flame images of the BGR three channels into a single channel
grayscale image to simplify the computational complexity and highlight the brightness
information of the flame. Next, remove the time and date information above the grayscale
image to avoid interference from non flame features on the edge detection results.
Use Canny operator to perform edge detection on flame grayscale images and obtain
the edge contours of flame particles. Using Hough line transformation to fit the trajectory
of flame particles in a straight line, the detected trajectory of flame particles
is represented by a straight line, thereby obtaining a clearer combustion trajectory
image. Finally, the system monitors the incineration process by combining the motion
trajectory of flame particles with the detection results of the target object.
Fig. 6. Architecture of industrial waste incineration treatment monitoring system
based on computer vision.