Mobile QR Code QR CODE

2025

Reject Ratio

81.5%


  1. (Department of Basic Medical Education, Dazhou Vocational College of Chinese Medicine, Dazhou, 635000, China)



Computer vision, Industrial waste, Incineration treatment, Image recognition, Image segmentation

Abbreviations

CVT:

Computer vision technology

IWI:

Industrial waste incineration

IS-IR:

Image segmentation and image recognition

CNN:

Convolutional neural network

IED:

Image edge detection

AP:

Average precision

mAP:

Mean average precision

FPS:

Frame per second

P-R curve:

Precision recall curve

Faster R-CNN:

Faster regions with CNN features

SSD:

Single shot MultiBox detector

Mask R-CNN:

Mask regions with CNN features

1. Introduction

The pace of industrialization in China is accelerating, and the amount of industrial waste generated is also continuously increasing. In the current industrial production environment, waste disposal is particularly important. Among the methods, incineration is a common approach that requires effective monitoring of its process. This is of profound significance for environmental protection and resource utilization [1]. Although the recyclability rate of garbage in China is very high, many garbage can be recycled and reused during the treatment process. However, currently, the treatment of urban household waste in China still mainly relies on landfill and incineration methods. The growth in urban household waste and the slow pace of innovation in waste treatment have had a detrimental impact on the urban environment. In contrast, the incineration treatment of industrial waste can not only effectively reduce the volume of waste and reduce dependence on landfills but also achieve resource reuse through energy recovery. In addition, during the incineration process of industrial waste, its harmful components will be decomposed at high temperatures, thereby reducing environmental pollution. However, traditional waste incineration processes often rely on manual identification and classification of waste, and timely monitoring of key parameters such as temperature and pressure during the incineration process. This approach is not only inefficient but may also involve human errors and omissions [2]. In addition, industrial waste is located in a high-temperature and high-speed circulating fluidized bed environment during the incineration process, which makes real-time and accurate combustion stability diagnosis difficult [3]. An unstable fluidized bed in the furnace may lead to dangerous situations such as detonation [4]. In recent years, computer vision technology (CVT) has shown excellent application value in many fields, especially in image recognition and data analysis, and its high accuracy and efficiency have been widely recognized [5]. Therefore, in this context, this study innovatively combines CVT in the process of industrial waste incineration (IWI) and applies image analysis algorithms to the identification and classification of waste. It aims to achieve efficient and accurate monitoring of the IWI processing process. The research content is divided into four parts. Part 1 is about the research background of IWI and the existing problems in the current IWI process. Part 2 provides an overview of the current research status of flame detection and CVT. Part 3 designs an IWI processing and monitoring system based on CVT, including system design, image cleaning, and computer vision-based image segmentation and image recognition (IS-IR) algorithms. Part 4 verifies the effectiveness of the monitoring system.

The flame detection field of waste incineration has attracted high attention from many industry experts and has given rise to a large number of research results. Zhao et al. were committed to improving the accuracy and efficiency of flame detection, thus designing a flame detection system based on an improved Yolov3 algorithm. This method adopted an optimized multi-scale detection network and introduced new scale features. Compared with traditional detection methods, this method has achieved significant improvements in accuracy and efficiency [6]. To effectively identify fires in large open buildings and outdoor areas, Hakeem et al. have designed a flame detection method based on computer vision systems. This method utilized integer Haar to enhance wavelet transform for framing and analysis of input videos, while also introducing frame difference technique to reduce false positives. This method has shown excellent performance in fire identification [7]. Liu et al. proposed a computer vision security monitoring platform based on federated learning to address the challenges of traditional visual object detection in terms of privacy and data transmission costs. This design utilized visual object detection technology from federated learning to develop computer vision applications on the Fed Vision machine learning engineering platform. This method significantly improved efficiency and reduces costs [8].

CVT plays an important role in waste incineration detection. Dong et al. proposed a multi-level detection scheme using CVT to achieve local and global health monitoring of the structure. At the local level, this method could efficiently detect damage such as cracks, peeling, and delamination. At the global level, it could perform displacement measurement and comprehensive damage detection. This method has shown significant effectiveness and feasibility in health monitoring [9]. Zhao et al. proposed using CVT to construct a network security system in response to the increasingly serious network security issues. This method extracted statistical features from multiple sources, such as binary files, emails, and packet flows, and applied them to multiple key areas, such as phishing attempt detection, malware detection, and traffic anomaly detection. This method not only effectively addressed known network security threats but also had excellent processing capabilities for complex and unknown threats [10]. Li et al. proposed a convolutional neural network (CNN) architecture based on CVT to improve the efficiency and accuracy of animal management. This architecture could achieve comprehensive monitoring and analysis of farm animals through applications such as image classification, object detection, pose estimation, and tracking. This method significantly optimized the management process of farm animals [11].

In summary, although some important research achievements have been made in flame detection and CVT of waste incineration, the combination of these two applications in IWI treatment is still rare. Accordingly, this study develops an IWI processing monitoring software based on CVT and a comprehensive computer vision-based monitoring system to track essential parameters in real-time throughout the incineration process. This is done with the objective of enhancing safety and efficiency. Most existing literature only focuses on flame detection and the application of computer vision in safety monitoring. However, research combining flame detection and computer vision technology for waste incineration is still relatively rare. Therefore, this study fills the research gap in this field by designing a CVT-based monitoring system and proposes a new method for real-time monitoring of key parameters during the incineration process, thereby improving safety and efficiency. This provides new contributions and perspectives for research in this field.

2. Design of Monitoring Software for Industrial Waste Incineration Treatment

This section specifically focuses on the incineration process of industrial waste and designs an IWI treatment monitoring system based on CVT. The system adopts image processing technology and deep learning algorithms for real-time monitoring and analysis of the incineration process, aiming to improve the automation and intelligence level of IWI processing.

2.1. Design of Monitoring System for Waste Incineration Treatment

With the continuous growth of China’s comprehensive national strength and the rapid development of industrial modernization, people’s living standards have gradually improved. At the same time, with the increase of consumption level, the amount of household waste generated continues to rise, and the pressure on the environment from cities has also increased. CVT can perform object detection tasks, that is, accurately locate and classify objects in images or image sequences. Therefore, research is being conducted on the development and design of monitoring software for IWI treatment using CVT. The monitoring system uses high-temperature resistant YT-SGFL optical image acquisition equipment as the core monitoring equipment. This device has excellent high-temperature resistance and can directly penetrate the camera lens into the high-temperature and high-pressure incinerator, capturing key information such as combustion conditions and flame shape inside the furnace [12]. This not only greatly improves production safety but also adapts to high-temperature working environments, providing strong support for the automation process in industrial production. The diagram of the YT-SGFL monitoring equipment is shown in Fig. 1.

Fig. 1. Schematic diagram of YT-SGFL type monitoring equipment.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig1.png

In Fig. 1, the monitoring device has carefully designed exhaust vents and lens cooling covers in front of the camera. By injecting cooler compressed air or nitrogen into the probe housing, the camera and lens can be continuously purged. This design can effectively prevent particles or dust generated by combustion in the furnace from contaminating the camera, thereby affecting image quality. On the other hand, this blowing method can also cool the front of the camera, preventing thermal corrosion problems caused by prolonged exposure to high temperatures. The industrial camera equipped with this device has a high resolution of 2 million pixels, ensuring the clarity and stability of the images. More notably, its signal-to-noise ratio is as high as 50dB, which means it can capture low-noise and high contrast images under various working conditions. The external material is made of high-quality stainless steel, which enables it to work stably within a wide temperature range of -10 °C to 50 °C and adapt to various environmental conditions. The industrial waste combustion detection system based on CVT is shown in Fig. 2.

Fig. 2. Industrial waste combustion detection system based on computer vision technology.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig2.png

In Fig. 2, the flame image inside the incinerator is captured by optical equipment and sent to a video splitter. On the one hand, it serves as a real-time monitoring video for staff to observe, and on the other hand, it is stored in the form of a signal to the computer. Internally, images are processed through computer vision and image processing techniques to infer the combustion state and evaluate it. The processed information is transmitted to the control center, where the staff can understand the condition of the combustion furnace and provide suggestions for decision-making. In flame image processing, computer vision first enhances image visibility, eliminates interference, and defines flame boundaries through preprocessing and cleaning. Considering the limitations of image acquisition and the presence of noise in the image, image filtering is adopted to improve image quality. By modifying or enhancing the image, highlighting features or removing unnecessary parts, the image quality and the applicability of subsequent processing analysis are effectively improved. The mathematical expression for image median filtering is shown in Eq. (1).

(1)
$ I'(i, j) = \sum_{m,n} I(i+m, j +n)\times K(m,n). $

In Eq. (1), $I$ and $I'$ represent the original image and the filtered image, respectively. Both $i$ and $j$ are the pixel coordinates of the image. $m$ and $n$ are the size of the filtering window. $K$ represents the offset of pixels in the window. Image denoising is an important step in digital image processing, aimed at reducing image noise. This study uses adaptive filtering for denoising. This algorithm optimizes the filtering effect based on local variance and has a good effect on eliminating Gaussian noise. The minimum mean square error of adaptive filtering is shown in Eq. (2).

(2)
$ \omega(n+1) = \omega(n) + \mu * e(n) * x(n). $

In Eq. (2), $\omega(n)$ represents the weight vector of the filter at time $n$. $\mu$ represents the step size factor. $e(n)$ and $x(n)$ represent the error signal and input signal at time $n$, respectively. Image edge detection (IED) is a key feature in image recognition, which identifies points in the image with significant changes in brightness and is an important basis for image segmentation. The Canny operator is a commonly used IED algorithm proposed by John F. Canny in 1986, which exhibits excellent performance and is widely adopted [13]. Therefore, this study applies the Canny operator for IED. The workflow of the Canny operator includes steps such as denoising and calculating gradient amplitude and direction. Due to the extreme sensitivity of edge detection to noise, Gaussian filters are first used to smooth the image and reduce noise. The mathematical expression for smoothing is shown in Eq. (3).

(3)
$ I''(i, j) = [G(i, j)]\times I(i, j). $

In Eq. (3), $I''$ represents the smoothed image, and $G(i, j)$ represents the Gaussian function. The calculation formula for gradient amplitude is shown in Eq. (4).

(4)
$ M(a,b) = \sqrt{pa(i, j)^2 + pb(i, j)^2}. $

In Eq. (4), $M$ represents the amplitude of the gradient, $p$ represents the partial derivative, and $(a,b)$ represents the coordinates of the function at a specific position in two-dimensional space. The formula for gradient direction is shown in Eq. (5).

(5)
$ \theta(i, j) \frac{pa(i, j)}{pb(i, j)}. $

In Eq. (5), $\theta$ represents the direction of the gradient. The diagram of IED extraction based on Canny operator is shown in Fig. 3.

Fig. 3. Schematic diagram of image edge detection and extraction based on Canny operator

../../Resources/ieie/IEIESPC.2026.15.3.309/fig3.png

Fig. 3 shows the results of Canny operator edge detection. This method identifies as many true edges of the image as possible, and the identified edges should be as close as possible to the actual image edges. The preprocessing steps of image filtering and edge detection provide convenience for image cleaning, further improving the accuracy of flame image recognition and detection. These steps provide necessary preparations for subsequent image processing, optimizing the efficiency of the entire image recognition and detection process. In computer vision technology, the application of image segmentation and recognition algorithms can provide real-time status assessment and feedback for the waste incineration process. By detecting the shape and position of the flame in real-time, the system can adjust the incineration conditions in a timely manner and optimize combustion efficiency. By combining image processing with deep learning, the system can quantitatively evaluate the combustion situation during the incineration process, such as calculating indicators such as the proportion of combustion area and combustion intensity, and providing real-time feedback on the working status of the incinerator, providing decision-making basis for operators. In addition, abnormal combustion such as excessive flames, abnormal temperature, and incomplete combustion during waste incineration may pose a threat to the environment and safety. Computer vision systems can achieve early warning by analyzing abnormal phenomena in images.

2.2. IS-IR Algorithm based on CVT

In CVT, the study of IS-IR algorithm is crucial for understanding and parsing image content. By accurately segmenting and recognizing images, key information in the image can be effectively extracted. The advancement of this technology provides new possibilities for monitoring systems for CVT processing. With the help of the IS-IR algorithm, the monitoring system for waste incineration treatment can extract valuable information from complex images, monitor and precisely control the incineration process in real-time, thereby improving processing efficiency and safety. U-Net is a CNN architecture specifically designed for image segmentation. Its name comes from its U-shaped architecture, which includes a down-sampling and an up-sampling path, forming a U-shaped structure [14]. The structure of the U-Net image segmentation algorithm is shown in Fig. 4.

Fig. 4. The network structure of U-net image segmentation algorithm.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig4.png

In Fig. 4, the U-Net algorithm mainly consists of a contraction path and an expansion path. The contraction path is based on typical CNN, effectively obtaining image context information and learning features. The extended path restores object boundaries through upsampling, convolution, and skip connections, preserving detailed information [15]. In IWI processing monitoring, U-Net successfully achieves precise flame segmentation and recognition. However, when faced with complex waste feature recognition tasks such as fine shape, size, and state analysis, there may be limitations. Each version of the YOLO model offers models of different sizes, such as nano, small, medium, and hug. Although these models have a similar order in weight size and execution time, with the update of versions, some new versions of YOLO can achieve a better balance between accuracy and execution time while ensuring lower computational complexity. For example, new versions such as YOLOv5 have improved accuracy and shortened execution time while optimizing the network structure. Therefore, this study adopts the YOLOv5 recognition algorithm. YOLOv5 is known for its real-time performance and high accuracy in object detection, which can quickly and accurately identify the position and category of target objects [16]. Its focus on global information makes it perform better than U-Net in dealing with object detection problems under different scales, rotations, and lighting conditions [17]. The YOLOv5 network architecture is shown in Fig. 5.

Fig. 5. YOLOv5 network architecture.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig5.png

In Fig. 5, when processing the object detection task, the YOLOv5 algorithm first extracts the features of the input image through the backbone network. These features are then fed into the neck network for deeper feature processing. Finally, the processed features are input into the head network. In the head network, the algorithm determines the target category and corrects the coordinates of candidate boxes based on position offset, thereby obtaining more accurate object detection results [18]. On the basis of the traditional YOLOv5 flame detection framework, the study used RGB, HSV, and Lab color spaces for feature extraction of flame colors. The RGB color space provides intuitive color distribution information, the HSV space helps distinguish brightness, saturation, and hue, while the Lab color space can more accurately capture flame color changes under different lighting conditions. In order to capture the color information of flames, this study uses color clustering to classify flame regions and conducts fine-grained analysis of flame regions at different combustion stages. The system can identify the trend of combustion state changes based on the captured flame color information, and assist in optimizing the incineration process according to the combustion state. The loss function expression of YOLOv5 algorithm is shown in Eq. (6).

(6)
$ L = 1-IoU + \frac{\alpha^2}{\beta^2} +\varepsilon\gamma. $

In Eq. (6), $L$ represents the loss function. $IoU$ represents intersection and union ratio. $\alpha$ and $\beta$ respectively represent the Euclidean distance between the predicted box and the true box. $\varepsilon$ represents adjustable parameters. $\gamma$ represents auxiliary items. The definition of auxiliary items is shown in Eq. (7).

(7)
$ \gamma = \frac{4}{\pi^2} \left(\arctan\left(\frac{h_t}{w_t}\right) - \arctan\left(\frac{h_p}{w_p}\right)\right)^2. $

In Eq. (7), $h_t$ and $w_t$ represent the height and width of the actual box. $h_p$ and $w_p$ represent the height and width of the predicted box, respectively. $\pi$ represents pi. The definition of adjustable parameters is shown in Eq. (8).

(8)
$ \varepsilon = \frac{\gamma}{1-IoU +\gamma}. $

Artificial neural networks draw on the attention mechanism of the human brain to process continuous inputs. Under this mechanism, neural networks can balance and focus on various input data according to task requirements, thereby more accurately extracting relevant information [19]. To enhance the performance of flame image recognition, this study innovates on the basis of the original YOLOv5 architecture. This improvement aims to capture local characteristics and contextual information of images more efficiently. Specifically, by integrating the SimAM attention module into the original YOLOv5, the model’s ability to identify flame image features has been further improved. As an advanced attention mechanism module, the SimAM module is capable of conducting a comprehensive investigation into the interrelationships between features, thereby enabling the model to prioritize target features and enhance the precision of the prediction [20]. The energy function defined by the SimAM attention module is expressed as Eq. (9).

(9)
$ e_t(c_t,d_t, y, x_i) = (y_t -\hat{t})^2 + \frac{1}{M -1} \sum_{i=1}^{M-1} (y_0 -\hat{x}_i)^2. $

In Eq. (9), $e$ represents the energy function. $x_i$ represents the target neuron of the input feature. $M$ represents the number of neurons. $c_t$ and $d_t$ are weight and deviation, respectively. The final energy function after regularization is shown in Eq. (10).

(10)
$ e_t(c_t,d_t, y, x_i) = \frac{1}{M -1} \sum_{i=1}^{M-1} (-1-(c_tx_i +d_t))^2 + (1-c_tx_i +d_t))^2. $

In Eq. (10), there exists an energy function for each channel, which requires a large amount of computation. Therefore, the closed form solutions for weight and deviation can be calculated first. The average value of neurons $v_t$ is shown in Eq. (11).

(11)
$ v_t = \frac{1}{M -1} \sum_{i=1}^{M-1} x_i. $

The variance of neurons $\sigma^2_t$ is shown in Eq. (12).

(12)
$ \sigma^2_t = \frac{1}{M -1} \sum_{i=1}^{M-1} (x_i = v_t)^2. $

The closed form solution of the weight is shown in Eq. (13).

(13)
$ c_t = \frac{2(t -v_t)}{(t -v_t)^2 +2\sigma^2_t}. $

The closed form solution of the deviation is shown in Eq. (14).

(14)
$ d_t = \frac{1}{2} (t +v_t)c_t. $

As shown in Eqs. (13) and (14), the closed form solutions for weights and biases can be obtained in a single channel, so it is reasonable to assume that all pixels in that channel follow the same distribution. Based on this assumption, it is possible to calculate the mean and variance of all neurons and reuse all neurons on that channel. This approach can significantly reduce computational costs and avoid iterative calculations of weights and biases for each position. The minimum energy function is expressed as Eq. (15).

(15)
$ e'_t = \frac{4\left(\frac{1}{M} \sum_{i=1}^M \left(x_i - \frac{1}{M} \sum_{i=1}^M x_i\right)\right)^2}{\left(t - \frac{1}{M} \sum_{i=1}^M x_i\right)^2 +2\left[\frac{1}{M} \sum_{i=1}^M \left(x_i - \frac{1}{M} \sum_{i=1}^M x_i\right)\right]}. $

The architecture of the IWI treatment monitoring system based on computer vision is shown in Fig. 6. In the system architecture, the research initially employs image median filtering to accentuate salient features and delineate the flame boundary. Subsequently, the Canny operator is utilized to perform edge computing, thereby extracting the image that encapsulates the edge information pertaining to the flame particle trajectory. Thereafter, the U-Net image segmentation algorithm is deployed to accurately segment and recognize the flame. Finally, the YOLOv5 image recognition algorithm is employed to identify the position and category of target objects. In the process of flame image processing, the BGR three-channel flame image is initially converted into a single-channel grayscale image. Subsequently, the time and date information displayed above the grayscale image is removed. Then, the Canny operator is used for edge computing to get the edge detection results. Finally, the obtained trajectory of flame particles is represented by straight lines through line transformation. The specific steps are as follows: Firstly, convert the flame images of the BGR three channels into a single channel grayscale image to simplify the computational complexity and highlight the brightness information of the flame. Next, remove the time and date information above the grayscale image to avoid interference from non flame features on the edge detection results. Use Canny operator to perform edge detection on flame grayscale images and obtain the edge contours of flame particles. Using Hough line transformation to fit the trajectory of flame particles in a straight line, the detected trajectory of flame particles is represented by a straight line, thereby obtaining a clearer combustion trajectory image. Finally, the system monitors the incineration process by combining the motion trajectory of flame particles with the detection results of the target object.

Fig. 6. Architecture of industrial waste incineration treatment monitoring system based on computer vision.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig6.png

3. Verification of Waste Incineration Treatment Monitoring System Based on Computer Vision

This chapter first sets up an experimental environment, and then verifies and analyzes the improved YOLOv5 algorithm, image segmentation algorithm, and the performance of the entire system in practical applications.

3.1. Experimental Environment Setup and YOLOv5 Algorithm Validation

To verify the performance of the CVT-based IWI processing monitoring system, this study uses C++ as the underlying development language to build an experimental platform. The specific software configuration and system environment are shown in Table 1. The experimental setup limits the number of iterations to no more than 300 and selects a learning rate of 0.001. In addition, the batch size is set to 8. This is because in preliminary experiments, it is found that after more than 300 iterations, the performance improvement of the model becomes insignificant and overfitting may occur. Therefore, limiting the number of iterations to 300 can strike a balance between ensuring model performance and preventing overfitting. It is advisable to avoid excessively high learning rates, as these may result in the model oscillating or failing to converge during training. Similarly, excessively low learning rates may lead to prolonged training times and the potential for the model to fall into a local optimum. A learning rate of 0.001 has been proven to be an ideal choice for achieving a balance between stable convergence and training time. In addition, smaller batch sizes can update model parameters more frequently, thereby responding faster to changes in the data. However, a batch size that is too small will increase training time and computational overhead. Therefore, to effectively utilize the computing power of GPU, the batch processing size is set to 8 after comprehensive consideration. When processing flame images inside the incinerator, it is necessary to preprocess the images first. The acquisition and structuring of training data are as follows: Firstly, a high-resolution industrial camera is used to capture real-time flame images inside the incinerator. After obtaining the images, a noise filter is used to denoise them, and then they are normalized to reduce pixel values to a unified range for model training. Ultimately, it is essential to standardize the image size by fixing the resolution, thus guaranteeing uniformity in the input data. After the above structured image operations, these images are divided into training and validation sets in a 4:6 ratio for the purpose of model training and validation. This study mainly uses Average Precision (AP), Mean Average Precision (mAP), and Frame Per Second (FPS) as evaluation metrics to measure algorithm performance. FPS is the most intuitive indicator for measuring detection speed, referring to the number of image frames that an algorithm can detect per second. In practical applications, the standard for real-time detection is usually FPS≥30. Among them, AP is a commonly used evaluation metric to measure the accuracy of detection algorithms in various categories. This indicator can comprehensively reflect the accuracy and stability of the algorithm in detecting a single target category. The mAP is a commonly used comprehensive evaluation metric in multi-class object detection, which can evaluate the overall performance of algorithms in multi-class detection tasks and better reflect the generality and adaptability of algorithms in complex scenes. FPS represents the number of image frames that the algorithm can process per second, mainly used to evaluate the detection speed of the algorithm. This indicator can reflect the system’s ability to respond quickly and detect in a timely manner.

Table 1. Specific software configuration and system environment.

System environment Configuration
Development language C++
Framework Qt Creator
CPU Intel i9-9900K
Graphics memory 6G
GPU NVIDIA GeForce GTX 1080Ti
Operating system Windows 10

3.2. Performance Verification of Waste Incineration Treatment Monitoring System Based on CVT

The experiment uses a dataset containing the process of waste incineration treatment, which comes from the real-time combustion situation in the waste incineration area of China Energy Conservation and Environmental Protection Group Co., Ltd. The images in the dataset cover different types of incinerators, waste types, combustion states, and other scenarios. All images have undergone preprocessing and annotation, with annotation information including waste category, location, and bounding box coordinates. During the training process of the experiment, the Adam optimizer is used, with a total of 100 iterations are conducted to monitor the changes in the loss function during the training process. This approach serves to prevent the model from exhibiting signs of overfitting. To verify the application effect of CVT in the waste incineration treatment monitoring system, the performance of the improved YOLOv5 algorithm is first evaluated. This study uses the Precision Recall Curve (P-R curve) as an evaluation tool for experimental analysis. The P-R curve of the improved YOLOv5 algorithm is displayed in Fig. 7. The AUC value of this algorithm reaches 0.987. In the overall evaluation of all categories, its mAP0.5 value is 0.877. Both of these values indicate that the improved YOLOv5 algorithm has a high level of image recognition accuracy.

Fig. 7. P-R Curve for improving YOLOv5 image recognition algorithm.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig7.png

To further verify the superiority of the improved YOLOv5 image recognition algorithm, this algorithm is compared and validated with commonly used image recognition algorithms such as traditional YOLOv5 algorithm, Faster Regions with CNN features (Faster R-CNN) algorithm, Single Shot MultiBox Detector (SSD) algorithm, Mask Regions with CNN features (Mask R-CNN) algorithm, improved YOLOv3, and traditional methods. In addition, to provide a more comprehensive view of the current status of image recognition technology, the study also compares and analyzes recent related methods, including the Scalable and Efficient Object Detection (EfficientDet) algorithm based on the EfficientNet backbone network and BiFPN feature pyramid, as well as the Detection Transformer (DETR) object detection algorithm based on the Transformer architecture. The specific experimental result data obtained are compared and analyzed, and the performance comparison of different image recognition algorithms is shown in Table 2. In the flame detection process, the improved YOLOv5 algorithm has the shortest flame detection delay time, only 0.023s, which is significantly lower than other algorithms. In the process of industrial waste incineration, lower latency helps to improve the real-time performance and response speed of the system, enabling rapid adjustment of the operating parameters of the incinerator and enabling the system to provide quick feedback when the flame changes. From the perspective of mAP, the improved YOLOv5 algorithm achieves 95.0%, which is 2.9%, 2.7%, 5.0%, and 2.1% higher than YOLOv5, Faster R-CNN, SSD, and Mask R-CNN algorithms, respectively. Compared with EfficientNet and DETR algorithms, it has improved by 1.5% and 2.0% respectively. In terms of AP, the improved YOLOv5 algorithm achieves 95.8%, which is 4.3%, 3.7%, 6.7%, and 2.2% higher than other algorithms, respectively. Compared with EfficientNet and DETR algorithms, it has improved by 2.7% and 3.3%, respectively. In terms of detection speed, the improved YOLOv5 algorithm achieves 87FPS, which increases detection speed by 31.8%, 19.2%, 201.0%, and 29.9% compared to other algorithms, respectively. Compared with EfficientNet and DETR algorithms, it has improved by 24.28% and 33.84%, respectively. Overall, the improved YOLOv5 algorithm has significant performance advantages. Compared with traditional detection methods, the improved YOLOv5 algorithm studied has improved the mAP by 14.6%, AP by 13.5%, and inspection speed by 65.15%. The Improved YOLOv3 designed by Zhao and other researchers has indeed achieved significant improvements in accuracy and efficiency compared to traditional algorithms [6]. However, the improvement effects in mAP, AP, and examination speed are 8.9%, 9.3%, and 8.0%, respectively. The improvement effect is not as good as that of this study, so it can be seen that previous research has limitations, and this study has better performance.

Table 2. Performance comparison of different image recognition algorithms.

Algorithm mAP/0.5% Average precision/% Detection speed/fps Flame detection delay/s
YOLOv5 92.1 91.5 66.0 0.045
Faster R-CNN 92.3 92.1 73.0 0.032
SSD 90.0 89.1 28.9 0.065
Mask R-CNN 92.9 93.6 67.0 0.041
Improved YOLOv5 95.0 95.8 87.0 0.023
Improved YOLOv3 86.4 87.3 25 0.074
Traditional method 77.5 78.0 23 0.125
EfficientNet 93.5 93.1 70 0.052
DETR 93.0 92.5 65 0.059

To verify the performance of the U-Net image segmentation algorithm, this study trains it. The specific parameter settings are: the initial learning rate is 0.01, the minimum learning rate is 0.001, and the iteration round is 50. Fig. 8 shows the loss function curve of the U-Net segmentation algorithm. In Fig. 8(a), the training loss, validation loss, smooth training loss, and smooth validation loss all converge to around 0.075 and stabilize after approximately 30 iterations. This indicates that the training and validation process of the model can effectively reduce errors, proving the stability and reliability of the algorithm. In Fig. 8(b), the average intersection to union ratio of the algorithm gradually stabilizes after 20 iterations, ultimately reaching 97.2%. In summary, the U-Net algorithm exhibits high efficiency and accuracy under the set parameters, making it suitable for handling complex image segmentation tasks.

Fig. 8. The loss function curve of U-net image segmentation algorithm.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig8.png

3.3. Practical Application Verification of IWI Processing Monitoring Software based on Computer Vision

To verify the application effect of CVT in monitoring the IWI processing, this study conducts empirical research using incinerator videos. According to the frequency of extracting one frame of image per second, these images are input into the monitoring software for analysis and recognition. The feeder of the incinerator releases industrial waste every 100 seconds. This operation will cause dust emission, which in turn affects the bright form of the flame. Fig. 9 shows the recognition results of bright flame areas. The area change of the bright flame area shows a certain regularity. Approximately every 100 seconds, the bright area of the flame decreases. This phenomenon is consistent with the time interval of the incinerator feeding, thus proving the feasibility and accuracy of the monitoring system in flame recognition. From the experimental results, it can be observed that the area change of the bright flame region shows a regularity consistent with the time interval of the incinerator pushing material. The monitoring system is capable of accurately identifying changes in the flame area during the incineration process and corresponding to the operating time intervals. The analysis of flame combustion area shows differences in combustion intensity in different regions. For example, when the running time reaches 800 seconds, the combustion intensity of the flame on the right side is significantly higher than that on the left side, which may be due to uneven waste disposal. In addition, the lower combustion area of the incinerator remains stable throughout the entire process, with minimal changes in area. However, there are certain obstacles in this evaluation process. It can be seen that although the system performs well in most cases, there are still certain challenges in accurately identifying flames in complex environments such as dust or significant changes in flame morphology, which can affect the recognition accuracy of the system.

Fig. 9. Identification results of bright flame areas.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig9.png

The horizontal axis in Fig. 10 represents the running time, and the vertical axis represents the ratio of left, right, and up/down combustion heat areas. To verify the recognition ability of CVT in monitoring the combustion of waste during IWI processing, the system is used to identify the proportion of flame combustion area, as shown in Fig. 10. In Fig. 10(a), when the running time reaches 800s, the area proportions of the left and right combustion areas in the flame image are 20.2% and 74.6%, respectively. It can be seen that the flame intensity on the right side is significantly higher than that on the left side, indicating that the flame intensity on the right side is significantly higher than that on the left side, and the flame distribution is asymmetric. The reason for this phenomenon may be uneven waste disposal, which leads to a higher amount of combustibles in certain areas, resulting in localized combustion enhancement. In addition, uneven airflow distribution inside the incinerator may also lead to different oxygen levels during combustion, thereby affecting flame distribution. In Fig. 10(b), when the running time is 800 seconds, the proportion of the combustion area above and below the flame image is 47.1% and 1.1%, respectively. This indicates that the lower combustion area of the incinerator remains stable with minimal changes in area. This indicates that the flames are mainly concentrated in the upper part of the incinerator, while the changes in the lower combustion zone are relatively small. The reason for this phenomenon may be that the heat flow and gas generated after waste combustion rise, causing flames to concentrate more in the upper area. In addition, the lower combustion zone may have already been fully burned, so there are fewer visible flames, indicating that the combustion in this area is relatively stable. In summary, the monitoring system can effectively identify and monitor the combustion process of the incinerator.

Fig. 10. The proportion of flame burning area.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig10.png

To further validate the performance of computer vision image recognition systems, a comparison of recognition accuracy is made with systems based on algorithms such as YOLOv5, Faster R-CNN, SSD, and Mask R-CNN. Fig. 11 shows the comparison of flame recognition accuracy between different systems in the training and validation sets. In the training set, the recognition accuracy of the CVT-based system is as high as 94.8%. Compared to systems based on algorithms such as YOLOv5, Faster R-CNN, SSD, and Mask R-CNN, its accuracy has increased by 86.3%, 91.5%, 83.1%, and 87.9%, respectively, with an increase of 8.5%, 3.3%, 11.7%, and 6.9%. In the validation set, the recognition accuracy of the CVT-based system is 94.1%, which is 6.6%, 4.1%, 11.4%, and 5.5% higher than the accuracy of other systems at 87.5%, 90.0%, 82.7%, and 88.6%, respectively. Overall, the CVT-based system has shown superior performance in flame recognition accuracy.

Fig. 11. Comparison of flame recognition accuracy between different systems.

../../Resources/ieie/IEIESPC.2026.15.3.309/fig11.png

4. Discussion

The combustion treatment of industrial waste is an important field in current industrial production. With the development of computer technology and image processing technology, the application of computer vision-based boiler combustion state detection technology in boiler combustion state monitoring and diagnosis has great practical value. To strengthen the monitoring of the combustion process, the U-Net algorithm was used to detect the combustion adequacy of garbage incineration, and CVT was combined with high-temperature resistant pinhole lenses and other monitoring equipment to design an IWI processing monitoring software based on computer vision. From the research results, the improved YOLOv5 algorithm exhibited extremely high image recognition accuracy in the P-R curve and AUC value data. Among them, the AUC value reached 0.987 and the mAP0.5 value was 0.877, indicating that the algorithm has high accuracy and reliability in identifying bright areas of incinerator flames. The flame detection method based on computer vision system studied by scholars such as Hakeem has shown excellent performance in fire recognition, but its recognition accuracy is still relatively inferior compared to the results of this study [7].

The analysis results of the flame combustion area showed that the improved YOLOv5 algorithm could effectively identify the changes in the flame area during the incineration process. Especially after the incinerator pusher released industrial waste every 100 seconds, the area of the bright flame area would decrease, which is consistent with the actual operation time interval. This monitoring system can not only accurately identify changes in the flame area during the incineration process but also detect differences in intensity between different combustion areas. For example, when the running time reached 800 seconds, the combustion intensity of the flame on the right side significantly exceeded that on the left side, while the lower combustion area remained stable with minimal changes in area. These results further validate the effectiveness and accuracy of the improved YOLOv5 algorithm in practical applications. From the performance comparison with traditional detection methods, the improved YOLOv5 image recognition algorithm studied had a detection speed of 87FPS, which was 65.15% higher than the traditional method’s 23FPS. The improvement in detection speed not only means an increase in the computational efficiency of the algorithm when processing large amounts of image data, but also significantly shortens the response time. Real-time monitoring and rapid response are crucial in the incineration process of industrial waste, especially in situations where the flame area changes frequently. By improving the detection speed, the monitoring system can more quickly detect changes in the flame state, provide timely feedback on the combustion situation inside the incinerator, and adjust the operating parameters of the incinerator accordingly. Therefore, in the efficient feedback overload of the system, the management of the entire incineration process has been enhanced, which not only improves combustion efficiency but also reduces energy consumption and pollution emissions in waste incineration. The computer vision security monitoring platform based on federated learning designed by Liu’s research team also has an efficiency improvement effect, but its improvement effect is not as significant as the improvement in this study [8].

Overall, the improved YOLOv5 image recognition algorithm studied is effective in the monitoring system for IWI treatment, effectively improving the accuracy and efficiency.

5. Conclusion

With the acceleration of industrialization, the problem of industrial waste treatment has become increasingly prominent, among which IWI is an important treatment method. To monitor the operation status of incinerators in real time and improve the efficiency and safety of waste incineration treatment, this study developed an IWI monitoring software based on CVT and optimized it using deep learning methods. The research results showed that the improved YOLOv5 algorithm achieved an AUC value of 0.987 and an mAP0.5 value of 0.877. The loss function value of the U-Net algorithm converged to about 0.075, and the average intersection to union ratio ultimately reached 97.2%. In terms of AP, the improved YOLOv5 algorithm achieved an accuracy of 95.8%, which was 4.3%, 3.7%, 6.7%, and 2.2% higher than traditional YOLOv5, Faster R-CNN, SSD, and Mask R-CNN, respectively. In terms of detection speed, the improved YOLOv5 algorithm achieved a detection speed of 87fps, which was 31.8%, 19.2%, 201.0%, and 29.9% higher than other algorithms, respectively. From the perspective of mAP, the improved YOLOv5 algorithm achieved 95.0%, which was 2.9%, 2.7%, 5.0%, and 2.1% higher than other algorithms. In the training set, the recognition accuracy of the CVT-based system reached 94.8%, while in the validation set it was 94.1%. Overall, the IWI processing monitoring software based on CVT has improved the accuracy and efficiency of monitoring.

However, there are also some shortcomings in this study, such as the need for improvement in flame recognition in complex environments. Future research will consider introducing more sensor data, such as temperature, pressure, etc., for multi-modal analysis to further improve the accuracy and comprehensiveness of monitoring. In addition, further algorithm optimization is needed to cope with more complex working conditions and environmental changes. By addressing these shortcomings and making improvements, future research can achieve a more comprehensive and accurate monitoring system based on existing foundations, thereby better supporting the safe and efficient operation of IWI treatment.

References

1 
J.-F. Qiao , Z.-H. Guo , J. Tang , Dioxin emission concentration measurement approaches for municipal solid wastes incineration process: A survey, Acta Automatica Sinica, Vol. 46, No. 6, pp. 1063-1089, June, 2020DOI
2 
E. de Titto , A. Savino , Environmental and health risks related to waste incineration, Waste Management & Research, Vol. 37, No. 10, pp. 976-986, July, 2019DOI
3 
S. Khokhlov , Z. Abiev , V. Makkoev , The choice of optical flame detectors for automatic explosion containment systems based on the results of explosion radiation analysis of methane-and dust-air mixtures, Applied Sciences, Vol. 12, No. 3, pp. 1515-1523, December, 2022DOI
4 
J. Ryu , D. Kwak , A study on flame and smoke detection algorithm using convolutional neural network based on deep learning, Journal of The Korean Society of Hazard Mitigation, Vol. 20, No. 1, pp. 223-232, February, 2020DOI
5 
P. Barmpoutis , T. Stathaki , K. Dimitropoulos , N. Grammalidis , Early fire detection based on aerial 360-degree sensors, deep convolution neural networks and exploitation of fire dynamic textures, Remote Sensing, Vol. 12, No. 19, pp. 3177-3201, August, 2020DOI
6 
Y.-Y. Zhao , J. Zhu , Y.-K. Xie , W.-L. Li , Y.-K. Guo , A real-time video flame detection algorithm based on improved YOLO-V3, Geomatics and Information Science of Wuhan University, Vol. 46, No. 3, pp. 326-334, March, 2021DOI
7 
Z.-S. Al Hakeem , H.-I. Shahadi , H.-H. Abbas , An automatic flame detection system for outdoor areas, TELKOMNIKA (Telecommunication Computing Electronics and Control), Vol. 21, No. 4, pp. 864-871, August, 2023DOI
8 
Y. Liu , A. Huang , Y. Luo , H. Huang , Y. Liu , Y. Chen , Q. Yang , Fedvision: An online visual object detection platform powered by federated learning, Proc. of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 08, pp. 13172-13179, April, 2020DOI
9 
C.-Z. Dong , F. N. Catbas , A review of computer vision-based structural health monitoring at local and global levels, Structural Health Monitoring, Vol. 20, No. 2, pp. 692-743, July, 2021DOI
10 
J. Zhao , R. Masood , S. Seneviratne , A review of computer vision methods in network security, IEEE Communications Surveys & Tutorials, Vol. 23, No. 3, pp. 1838-1878, June, 2021DOI
11 
G. Li , Y. Huang , Z. Chen , G.-D. Chesser , J.-L. Purswell , J. Linhoss , Y. Zhao , Practices and applications of convolutional neural network-based computer vision systems in animal farming: A review, Sensors, Vol. 21, No. 4, pp. 1492-1495, January, 2021DOI
12 
J. Guo , H. He , T. He , L. Lausen , M. Li , H. Lin , Y. Zhu , GluonCV and GluonNLP: Deep learning in computer vision and natural language processing, The Journal of Machine Learning Research, Vol. 21, No. 1, pp. 845-851, February, 2020DOI
13 
J. Janai , F. Güney , A. Behl , A. Geiger , Computer vision for autonomous vehicles: Problems, datasets and state of the art, Foundations and Trends in Computer Graphics and Vision, Vol. 12, No. 1-3, pp. 1-308, July, 2020DOI
14 
Z. Wang , Q. She , T.-E. Ward , Generative adversarial networks in computer vision: A survey and taxonomy, ACM Computing Surveys, Vol. 54, No. 2, pp. 1-38, February, 2021DOI
15 
D. Bhatt , C. Patel , H. Talsania , J. Patel , R. Vaghela , S. Pandya , H. Ghayvat , CNN variants for computer vision: History, architecture, application, challenges and future scope, Electronics, Vol. 10, No. 20, pp. 2470-2475, September, 2021DOI
16 
H. Tian , T. Wang , Y. Liu , X. Qiao , Y. Li , Computer vision technology in agricultural automation-a review, Information Processing in Agriculture, Vol. 7, No. 1, pp. 1-19, March, 2020DOI
17 
M.-H. Guo , T.-X. Xu , J.-J. Liu , Z.-N. Liu , P.-T. Jiang , T.-J. Mu , S.-M. Hu , Attention mechanisms in computer vision: A survey, Computational Visual Media, Vol. 8, No. 3, pp. 331-368, March, 2022DOI
18 
P. Barmpoutis , P. Papaioannou , K. Dimitropoulos , N. Grammalidis , A review on early forest fire detection systems using optical remote sensing, Sensors, Vol. 20, No. 22, pp. 6442-6447, October, 2020DOI
19 
B. Kim , J. Lee , A video-based fire detection using deep learning models, Applied Sciences, Vol. 9, No. 14, pp. 2862-2873, July, 2019DOI
20 
H. Mokayed , T.-Z. Quan , L. Alkhaled , V. Sivakumar , Real-time human detection and counting system using deep learning computer vision techniques, Artificial Intelligence and Applications, Vol. 1, No. 4, pp. 221-229, October, 2023DOI
Lan Zhu
../../Resources/ieie/IEIESPC.2026.15.3.309/au1.png

Lan Zhu is a computer teacher in the Department of Basic Medical Education the at Dazhou Vocational College of Chinese Medicine. He graduated from Chengdu University of Electronic Science and Technology in 2007, majoring in network engineering and obtained a master’s degree in education in 2014. In 2020, he participated in the compilation of the Computer Application Foundation in the 13th Five-Year Plan of general vocational education; in 2021, he participated in the compilation of the Principles of and Practice of Computer Network Security.

Deqiang Fei
../../Resources/ieie/IEIESPC.2026.15.3.309/au2.png

Deqiang Fei is currently a teacher at Dazhou Vocational College of Chinese Medicine. He graduated from the University of Electronic Science and Technology of China, majoring in Computer Network. He has published two monographs on computer applications and numerous academic papers in China. His main research interests include network information security and virtualization technology.