To efficiently and accurately detect and process transmission lines' external damage
forms, this paper first introduces the Convolutional Block Attention Module (CBAM)
to improve YOLOv8. Then, a transmission line image detection model based on improved
YOLOv8 is constructed, namely YOLOv8-CBAM. Finally, a new transmission line external
force damage prevention system integrating YOLOv8 and IDA is proposed by combining
the YOLOv8-CBAM-PCA transmission line image detection model with IDA.
2.1. Transmission Line Image Detection Model Based on YOLOv8-CBAM
Traditional image recognition mostly uses edge detection operators, such as Prewitt,
Sobel, and Canny. These methods often rely heavily on the design of Softmax and are
not universally applicable to solving computable problems [15]. YOLOv8 is a single-stage detecting algorithm using regression thinking, which can
directly convolve the obtained image data and label the category of the image [16]. YOLOv8 not only has good recognition accuracy but also has high real-time performance,
which can better adapt to image detection of transmission lines. Therefore, this study
introduces YOLOv8 to detect foreign object images in transmission lines. Fig. 1 shows the structure of YOLOv8.
Fig. 1. Network structure of YOLOv8.
In Fig. 1, first, the input layer is responsible for inputting the foreign object image information
of the transmission line to be detected into the network backbone module (CSPDarkNet-53).
Then, the CSPDarkNet-53 module layer extracts features from the foreign object image
information of the transmission line. Secondly, the feature enhancement module layer
is responsible for pooling and fusing the input transmission line feature foreign
object image information. Finally, the output layer outputs abnormal image information.
The CSPDarkNet-53 module layer of YOLOv8 includes four modules: feature extraction
layer (Focus), convolutional layer (Conv), Spatial Pyramid Pooling Fast (SPPF), and
BottleneckCSP. Fig. 2 shows the CSPDarkNet-53 module layer of YOLOv8.
Fig. 2. CSPDarkNet-53 module.
In Fig. 2, the Focus module mainly consists of a 3×3 sized Conv, which is responsible for cutting
the input transmission line foreign object image data into 4 parts and performing
channel dimension concatenation and feature map convolution operations. The Focus
module serves to reduce spatial resolution while simultaneously increasing the number
of channels. This is achieved by reducing the costs associated with convolution computations
and employing tensor reshaping operations. The Conv module is responsible for 2D convolution,
Batch Normalization (BN), and parity functions. The BottleneckCSP module is responsible
for extracting deep semantic information from transmission line images. SPPF is mainly
composed of three 5×5 convolutional kernels, responsible for increasing receptive
fields and separating important contextual features. BN calculation is represented
by Eq. (1).
In Eq. (1), $\mu_{B}$ and $\sigma_{B}^{2}$ correspond to tensors to calculate the mean and variance,
respectively. $x_{i}^{*}$ represents the normalized output value. $x_{i}$ represents
the values of the elements in the input feature map. $\varepsilon$ is a value that
prevents variance from being 0. However, the values after performing BN operations
will be concentrated around 0. Therefore, to enhance the nonlinearity of the network,
the BN calculation equation is scaled and translated. The BN calculation after scaling
and translation is represented by Eq. (2).
In Eq. (2), $\gamma$ and $\beta$ represent the scaling ratio and offset, respectively. The maximum
pooling operation is represented by Eq. (3).
In Eq. (3), $y_{m,n}^{d}$ refers to the maximum output transmission line characteristic map
value. $R_{m,n}^{d}$ refers to the area of the transmission line characteristic map.
The calculation of Conv is represented by Eq. (4).
In Eq. (4), $N$ represents the size of the output transmission line characteristic map. $M$,
$F$, $P$, and $S$ correspond to the input image size, convolution kernel size, fill
factor, and convolution kernel movement step in the transmission line, respectively.
However, YOLOv8 still cannot meet the detection requirements for foreign object images
on transmission lines. The attention mechanism can facilitate YOLOv8 to successfully
extract feature image information from transmission line diagrams and suppress irrelevant
image features, greatly improving image recognition efficiency [17,
18]. As a lightweight attention module, CBAM can effectively reduce the number of parameters
and computational resources required while improving the overall performance of the
model. This enables YOLOv8 to identify and locate targets with greater accuracy when
processing images. Accordingly, the study introduces the CBAM into the YOLOv8 network
algorithm, enhancing the CSPDarkNet-53 layer within the CBAM module and the C2f module
within the Neck layer. Furthermore, it proposes a transmission line image detection
model based on the improved YOLOv8 network algorithm. Fig. 3 shows the proposed YOLOv8-CBAM transmission line image detection model.
Fig. 3. The YOLOv8-CBAM transmission line image detection model.
In Fig. 3, the proposed YOLOv8-CBAM transmission line image detection model mainly consists
of four parts: CSPDarkNet-53, SPPF, multi-scale fusion structure, and YOLO v8 detection
head. Firstly, the model extracts transmission line image features from the CSPDarkNet-53
layer and fixes the image size. The improved C2f module extracts channel information
and spatial information from the input transmission line feature map. This can further
enhance the ability of multi-scale feature fusion, better capture feature expressions
at different levels, and improve the accuracy and stability of object detection networks.
Then, a fixed image size feature map is used to construct a feature pyramid and transmit
semantic information forward. Finally, the deep features will be upsampled and fused
with the features of the previous layer, resulting in higher resolution and richer
feature representations. This improved process not only enhances the resolution of
the features but also leads to a richer and more comprehensive feature representation,
which provides more accurate target localization and classification information for
the final detector head. The output feature value is represented by Eq. (5).
In Eq. (5), $w$ represents the convolutional kernel's weight value. $x$ and $R$ correspond to
the characteristic map and sampling point interval positions of the transmission line
to be tested, respectively. $p_{0}$ and $p_{n}$ correspond to output feature values
at a certain sampling point, respectively. $R$ is a positional element. The overall
implementation process of CBAM is represented by Eq. (6).
In Eq. (6), $M_{x}$ and $A_{x}$ represent maximum pooling and average pooling, respectively.
$f$ represents the fully connected layer.
2.2. Construction of External Force Damage Prevention System for Transmission Lines
The transmission line image detection model based on YOLOv8-CBAM can recognize the
damage caused by foreign objects in transmission line equipment through image recognition.
However, it is not yet sufficient to address the prevention of transmission line network
security issues. Therefore, a transmission line image detecting model using YOLOv8-CBAM
is utilized as the basic framework. A new transmission line external force damage
prevention system integrating YOLOv8 and IDA is constructed using IDA. Fig. 4 shows the basic process of IDA.
Fig. 4. Basic flow of intrusion detection algorithms.
In Fig. 4, IDA mainly includes the stages of information analysis, response decision-making,
and information collection. Firstly, in the information collection stage, the network
raw data information is collected and preprocessed according to the set rules, providing
data level support for the subsequent information analysis. Secondly, in the information
analysis stage, in-depth analysis is conducted on the data processed in the information
collection stage to make accurate judgments on intrusion behavior. Finally, during
the response decision stage, intrusion behavior is immediately prevented in accordance
with IDA's pre-set methods, such as prohibiting IP access, forcing shutdown, etc.
Fig. 5 shows the proposed transmission line external force damage prevention system.
Fig. 5. Structure of the transmission line external damage prevention system.
In Fig. 5, the proposed transmission line external force damage prevention system structure
mainly consists of four parts, namely data preprocessing, mixed sampling, improved
YOLOv8 transmission line image detection model, and classification results. Firstly,
the transmission line foreign object image dataset is used as the test data source.
The data preprocessing part is responsible for merging the input data information
to ensure that the transmission line foreign object damage prevention system model
can learn adequately. This includes cropping, scaling, and color correction of the
image data to fit the model inputs. To better train the transmission line external
damage prevention system model as well as not affecting the classification results,
the study also uses the Maximum-Minimum (Max-Min) method for numerical and normalization
operations to newly divide the data. This step is designed to eliminate the effect
of different scales, thereby facilitating the comparison of data at a uniform scale.
This, in turn, enhances the efficiency and effectiveness of model training. At the
same time, the image data are enhanced by rotating, scaling, and shearing techniques
to increase the diversity of the dataset and the generalization ability of the model.
Due to the possible imbalance between the various types of samples in the transmission
line foreign object image dataset, a mixed sampling method is used to balance the
dataset to ensure that the model is not biased against a particular category. Moreover,
the Principal Component Analysis (PCA) algorithm is used to downscale the data information
to reduce the dimensionality of the data and retain the most important features. Subsequently,
the processed data are fed into the YOLOv8 network for model training. During the
training process, the model is evaluated using a validation set to monitor the performance
of the model and prevent overfitting. Finally, the classification results are obtained
through the transmission line image detection model. The calculation of Max-Min is
represented by Eq. (7).
In Eq. (7), $x^{'}$ represents the result of normalizing the characteristics of the transmission
line data. $M_{min}$ and $M_{max}$ correspond to the minimum and maximum values after
feature processing, respectively. PCA information content is generally measured by
removing mean, calculating contribution rate, mean deviation, and covariance matrix.
The mean removal of data feature decentralization is represented by Eq. (8).
In Eq. (8), $\mu$ and $n$ represent mean removal and sample data quantity, respectively. The
mean deviation is represented by Eq. (9).
In Eq. (9), $\phi_{i}$ represents the deviation from the mean $\mu$ for each sample $x_{i}$.
The covariance matrix is represented by Eq. (10).
In Eq. (10), $S$ represents the dataset sample's covariance matrix. $T$ represents matrix. The
calculation contribution rate $\eta$ is represented by Eq. (11).
In Eq. (11), $m$ and $k$ represent matrix positions. $\alpha_{i}$ represents an eigenvector.
$n$ represents the number of samples. Fig. 6 shows the training of the proposed transmission line external force damage prevention
system.
Fig. 6. Training process.
In Fig. 6, the proposed system mainly has 5 training processes. Firstly, the input data are
processed using mixed sampling and PCA, with an 8*8 grayscale image as the input layer.
Then, Conv is used to extract features from grayscale images. BN performs batch normalization
on the extracted feature data to accelerate the convergence speed. Secondly, maximum
pooling is used to pool the processed data. The fully connected layer is used to recombine
the output features to reduce feature loss. Finally, the data information is output
using Softmax.