Mobile QR Code QR CODE

2025

Reject Ratio

81.5%

Development of a Computer Vision-based Monitoring System for Enhanced Industrial Waste Incineration Management

https://doi.org/10.5573/IEIESPC.2026.15.3.309

(Lan Zhu) ; (Deqiang Fei)

In modern industrial production, waste incineration treatment is a key link, but the monitoring and management of this process have a significant impact on the environment and resource utilization. To improve the efficiency and environmental safety of incineration treatment, this study aims to design and develop a computer vision-based monitoring software for industrial waste incineration treatment. This study uses a high-temperature resistant pinhole lens as a monitoring device, uses a U-Net algorithm to detect the combustion adequacy of garbage incineration, and uses an image segmentation algorithm to monitor the garbage incineration situation in real-time. The flame image information is collected through computer vision technology and the flame morphology inside the incinerator is analyzed. In addition, a computer vision-based waste incineration monitoring system has been developed for the incineration process of industrial waste. Image processing technology and deep learning algorithms are used to monitor and analyze the incineration process in real-time. SimAM attention module is added based on the original YOLOv5 architecture to improve the model’s ability to recognize flame image features. The results showed that in the training and validation sets, the recognition accuracy of the computer vision-based system was 94.8% and 94.1%, respectively. The improved YOLOv5 algorithm achieved an AUC value of 0.987, a mAP0.5 value of 0.877, and an accuracy of 95.8%. This indicates that the development and design of industrial waste incineration monitoring software based on computer vision significantly improves treatment efficiency and accuracy, while effectively reducing human errors and omissions.

Design of a Poster Visual Communication Text Information Extraction System Based on Semantic Segmentation and CNN

https://doi.org/10.5573/IEIESPC.2026.15.3.323

(Xianfeng Zeng)

Due to Internet and information technology developing, the rapid expansion of poster advertising brought about by online e-commerce has also posed a challenge to its standardized application and review work. Research has optimized character detection, text recognition, and keyword extraction in optical character recognition systems to reduce audit costs and improve work efficiency. A convolutional semantic segmentation network is used to detect text images and establish semantic segmentation channels to fuse semantic information, thereby achieving image segmentation accuracy. To address the complex background of text recognition, a multi-scale sequence encoding recognition algorithm with attention was designed by combining the two-dimensional spatial features of images and the sequence characteristics of text. And experiments were conducted on different algorithms and types of information, and good recognition accuracy rates were obtained, which were 86%, 89%, and 93%, respectively. The recognition rates for different types are 97.7%, 98.5%, 98.9%, and 98.6%, respectively. Finally, model operations were carried out on the server and client sides in the operational framework of the system. These experimental results confirm the superior performance of the multi-scale sequence encoding model based on semantic segmentation and attention mechanism. This provides a theoretical basis and technical reference for the recognition and shielding of printed media text information.

Packaging Pattern Design Based on Relative Coordinates and Region Content Replacement

https://doi.org/10.5573/IEIESPC.2026.15.3.334

(Songyan Cui) ; (Yaxuan Qi)

In the domain of modern packaging design, traditional patterns, including the Zhuang brocade, are underutilized due to their simplicity and lack of innovation. Addressing this, our study employs computer image processing technology to enhance the application of Zhuang brocade patterns. By extracting key features such as color, pattern, and organizational form, we have developed a novel design method and process relative coordinates and regional content replacement. This approach facilitates the automatic generation and customization of Zhuang brocade patterns, offering a valuable reference for their design and promotion. Experimental validation demonstrated that our algorithm significantly improves the efficiency and quality of packaging pattern design, with an experimental sample segmentation accuracy of 0.993, reflecting excellent segmentation performance. The algorithm’s high practical and promotional value is underscored by its ability to markedly enhance packaging pattern design efficiency. Future work will focus on expanding the dataset and further optimizing the design process.

Trends in Computer Vision Research

https://doi.org/10.5573/IEIESPC.2026.15.3.348

(Jaeseo Choi) ; (Changuk Choi) ; (Heera Ha) ; (Junghwan Kim) ; (Kyeongbo Kong) ; (Jaewon Royce Choi)

Numerous studies have been conducted to understand the research trends in the fields of artificial intelligence and machine learning. Although computer vision technology significantly influences the media domain, particularly the value chain of the video content industry, research identifying related studies and technical discussions remains insufficient. Therefore, this study collected articles published between 2017 and 2022 from major international conferences in computer vision, including CVPR, ECCV, and ICCV. Semantic network, institutional, and annual analyses of research trends across domains were conducted. The semantic network analysis of conference article titles revealed four distinct clusters. The analysis of collaborative research among countries revealed that computer vision research is primarily concentrated in the US and China, presenting the highest levels of research activity. The analysis of research trends based on years and affiliations indicates a gradual increase in collaborative research between universities and companies. The study identified annual research trends in computer vision. Further in-depth research is anticipated across diverse foundational and applied areas, fostering positive impacts on academic and industrial advancements.

Res2U-Net: Double Resnet on U-Net for Exudate Segmentation in Retinal Image

https://doi.org/10.5573/IEIESPC.2026.15.3.360

(Muhammad Arhami)

The appearance of exudate on retina indicates diabetic retinopathy. An accurate segmentation method is needed to detect the presence of exudates, both hard exudates and soft exudates. U-shaped network (UNet) is a segmentation architecture. The varied forms of exudates require architecture with deep layers. However, adding layers to UNet can result in vanishing gradients during training. The study modifies the UNet architecture by replacing the encoder and decoder of UNet with residual blocks for exudate segmentation. The architecture is named a double residual block on a U-shaped network (RES2U-Net). The residual block allows gradients to flow directly across several layers without having to deal with non-linear operations. It can overcome vanishing gradients in UNet. The proposed architecture is expected to maintain the flow of important information and handle vanishing gradients in each layer so the exudate segmentation process in retinal images is optimal for both hard and soft exudates. The application of RES2U-Net to exudate segmentation produces an accuracy above 95%. The F1-Score results above 0.80 show that the proposed architecture has a good balance in separating exudate from unnecessary features. These results show that the proposed architecture can provide accurate and valid exudate segmentation results on retinal images.

Research on the Application of NLP Algorithm Based on Multi-interaction Feature Fusion in English Writing Teaching

https://doi.org/10.5573/IEIESPC.2026.15.3.372

(Chunmei Qiao)

The research develops a system integrating Entity Framework, UI processing technologies, and access control models with a focus on natural language processing (NLP) for clause analysis, text segmentation, partof-speech tagging, phrase cutting, and grammar checking. It examines system architecture, user demographics, functional requirements, and builds a user-centric model. Core technologies covered include database structuring and dynamic/static modeling, with practical examples of how the system supports intelligent English writing training for college students. The study explores a method for integrating diverse behavioral characteristics in an information network to better predict user behavior by combining heterogeneous and homogeneous relationships. In a 16-week study with 300 high school students, one group received traditional training, while the other used an NLP-based multi-interaction feature fusion algorithm. Key interaction features analyzed include grammatical accuracy, lexical diversity, and semantic similarity, alongside short-term interests and temporal information. By using self-attention mechanisms to synthesize temporal and object information, the approach enhances recommendation performance, computational efficiency, and robustness, effectively supporting dynamic user behavior analysis within the training system.

Technical Action Recognition Algorithm for Cheerleaders Based on Inertial Sensors and Pose Motion Capture

https://doi.org/10.5573/IEIESPC.2026.15.3.385

(Yulin Kuang)

Aiming at the problems of low efficiency and strong subjectivity in cheer movement recognition, this paper proposes a recognition algorithm based on inertial sensor and improved spatiotemporal hypergraph convolution network. Firstly, high-precision 3D bone data is obtained by integrating inertial sensor and attitude capture technology. Secondly, the self-attention mechanism is embedded in the hypergraph convolutional network to dynamically assign joint weights, and the time-sparse hypergraph and channel sparse hypergraph are designed to achieve joint optimization of spatiotemporal features. The experimental results show that the accuracy rate, recall rate and F1 value of the proposed algorithm in the benchmark test are 0.96, 0.98 and 0.97 respectively, which is significantly better than the traditional algorithm. In the practical application test, the recognition accuracy rate of swing movements is as high as 99.2%, and the average recognition time of four types of movements is less than 0.1 seconds. This study combines hypergraph convolution with spatiotemporal sparse optimization for the first time, providing a new method for complex multi-joint movement recognition, and an efficient and objective technical tool for cheerleading training and competition evaluation.

Implicit Neural Representations: A Holistic Survey of Techniques, Applications, and Challenges

https://doi.org/10.5573/IEIESPC.2026.15.3.396

(Sukhun Ko) ; (Chanho Eom) ; (Jihyong Oh)

Implicit Neural Representations (INRs) have emerged as a new paradigm for representing signals and scenes, demonstrating flexibility and strong performance across diverse applications. INRs model data as continuous implicit functions with multilayer perceptrons (MLPs), offering advantages such as resolution independence and memory efficiency beyond discretized data structures. This survey provides a comprehensive review of recent INR methodologies and introduces a taxonomy that categorizes existing approaches into six groups: (1) activation functions, (2) positional encoding, (3) fourier-based reparameterization, (4) combined strategies, (5) implicit neural conditioning with prior knowledge, and (6) function decomposition with learnable operators. We analyze the core properties of INR models, highlighting their differentiability, compactness, and adaptability to varying resolutions, and discuss open challenges such as spectral bias and limitations in modeling high-frequency signals. Furthermore, we conduct a comparative analysis across representative methods to clarify similarities, differences, and design trade-offs. Through this synthesis, we aim to provide an accessible and structured reference that not only outlines the current landscape of INR research but also identifies gaps and opportunities for future exploration and advancement.

A Survey of Deep Learning-Based Network Anomaly Detection: Benchmarking on NSL-KDD, UNSW-NB15, and CICIDS2017

https://doi.org/10.5573/IEIESPC.2026.15.3.410

(Seoyeon Choi) ; (Songhye Kim) ; (Jihyeon Ryu)

Network anomaly detection is a critical component of cybersecurity, enabling the identification of potential infringements and malicious traffic. In recent years, machine learning and deep learning methods have been widely applied for this purpose. This study systematically reviews research employing NSL-KDD, UNSW-NB15, and CICIDS2017 datasets and compares the performance of commonly used models. The analysis shows that convolutional neural networks (CNNs) achieve strong results on imbalanced, high-dimensional data by leveraging regional patterns and hierarchical feature learning, while long short-term memory (LSTM) networks excel in capturing temporal dependencies. Generative adversarial networks (GANs) further enhance detection performance by addressing data imbalance and producing realistic attack samples. However, CNNs struggle with long-term dependencies, LSTMs incur high computational costs for long sequences, and GANs face instability and mode collapse. To address these limitations, emerging approaches such as transformers, contrastive learning, and LLMbased multimodal frameworks are gaining attention. This paper highlights the strengths and weaknesses of CNNs, LSTMs, and GANs and outlines promising directions for next-generation network anomaly detection.

Electric Power Steering System Identification Using Artificial Neural Network for Autonomous Vehicles

https://doi.org/10.5573/IEIESPC.2026.15.3.426

(Rodi Hartono) ; (Hyun Rok Cha) ; (Hee Tae Chung) ; (Kyoo Jae Shin)

The steering system is crucial for autonomous vehicles to accurately convert angle inputs into motion trajectories. Among the types of steering systems, electric power steering (EPS) has gained widespread adoption due to its superior reliability, safety, and efficiency compared to hydraulic systems. Precise steering control necessitates a deep understanding of EPS dynamics, typically achieved through mathematical modeling or system identification (SI). However, traditional modeling methods often fall short in capturing the complex nonlinear behavior of EPS. To address this limitation, we propose an artificial neural network (ANN) model trained using backpropagation (BP) to represent the dynamic characteristics of EPS. The ANN is trained on real-world data collected from a physical EPS system. Extensive testing on diverse datasets demonstrates the exceptional performance of our proposed model, achieving a remarkable fit of over 99.6% with measured data. This significant improvement over conventional methods highlights the potential of the ANN-based BP model as a superior approach for SI in EPS.

A Practical Study of English Vocabulary Teaching Based on Virtual Augmented Reality Interaction Technology

https://doi.org/10.5573/IEIESPC.2026.15.3.438

(Tan Zhang)

In the context of IT and educational integration and innovation, VR is applied extensively in educational domain. Based on virtual augmented reality interactive technology, this paper studies English vocabulary teaching. Firstly, the algorithm basis of virtual augmented reality interaction technology is proposed, including data tracking algorithm and positioning algorithm. Among them, compressed sensing is the process of using measurement matrix A to measure high-dimensional original signal X and obtain low-dimensional measurement signal Y. At this time, the length of measurement signal Y is far less than that of the original signal X. Thus, it can compress the signal X. The location arithmetic can simulate the person’s body and compute the location of the joint. Secondly, based on these algorithms, an English vocabulary teaching system based on virtual augmented reality interactive technology is proposed and applied. The research results are as follows: (1) In this thesis, the compression sensor tracking method is used to prevent the movement of components; According to the tree structure of the virtual avatar skeleton model, the problem of solving the spatial position can be better solved. (2) An English vocabulary teaching system is proposed based on virtual augmented reality interactive technology and the algorithm studied above. (3) In terms of system stability and running speed, 70% of the students rated above 6, 80% over 8, and 60% over 9. Thus, this system is able to satisfy the needs of English vocabulary. (4) The English Word Learning System with Virtual Augmented Reality Interaction Technique has improved the pass rate by 10%. These results show that the application of the EFL classroom can enhance the learners’ English vocabulary. It provides a new theoretical basis and practical path for English vocabulary teaching, and promotes the application and development of educational technology.

High Speed Accelerators Hardware Implementation for Fully Connected Neural Network Model Using 3D Systolic Array Architecture

https://doi.org/10.5573/IEIESPC.2026.15.3.452

(Pottipati Dileep Kumar Reddy) ; (Kota Venakata Ramanaih)

In Convolution Neural Network (CNN) is a primary building block for image processing applications with sub systems such as Convolution layer (CL), Max pooling Layer (MPL) & Fully Connected Neural Network (FCNN) layer. In order to address computation complexity of FCNN model in terms of processing speed, hardware implementation on FPGA is required to assess and optimize. In this study, systolic array algorithm-based 3D structure is developed to implement FCNN model. The 3D structure processes multiple frames of input data with three filters to generate simultaneously the FCNN output using multistage FCNN model. The processing elements that form the primary building block of systolic array model is designed suing basic arithmetic elements and control circuit for data synchronization. Verilog HDL is developed for the proposed model along with test bench to verify the functionality and the 3D structure with pipelined logic is implemented on Virtex-5 FPGA and form the synthesis report it is estimated that the operating frequency is 277 MHz which is 27% faster than direct implementation, power dissipation is also increased by 6% with tradeoff with computation speed. The 3D CNN structure is suitable for high-speed image processing applications.