3.1 Frequency-slicing Wavelet Transform-based Rotating Machinery Vibration Fault Signal
Identification Model
Considering that the vibration signal of rotating machinery contains considerable
noise interference, the direct input of the original signal for signal recognition
without processing will reduce the diagnostic performance of the model for fault signals.
A combination of frequency-slicing wavelet transforms, and capsule network will be
used to improve the accuracy of the model for fault signal recognition. The frequency-slicing
wavelet transform contains the arbitrary time-frequency resolution of the continuous
wavelet transform. At the same time, it does not require wavelet basis functions to
perform the inverse transform operation but is calculated using a Fourier transform.
Assuming that the original signal of the rotating machine vibration is$f(t)$, the
Fourier transform corresponding to$f(t)\in L^{2}(R)$ and$f(t)$ is expressed as $\hat{f}(w)$,
where$R$ is the set of real numbers and$L^{2}(R)$ is the finite vector space. Thus,
the frequency-sliced wavelet transform of this signal can be expressed using Eq. (1).
In Eq. (1),$\sigma $ and$\lambda $ are the scale factor and energy factor, respectively, and$u$
is the estimated frequency. $t$ and $w$ represent the signal monitoring time and its
monitoring frequency, respectively(Ed note: The use of the word ``respectively'' to
link two or more groups of words improves the clarity of the sentence.). $\hat{p}$
and $\hat{p}^{\ast }$ is the frequency slice function and its conjugate function,
respectively; e is the natural logarithm and $i$ are imaginary units. Eq. (2) is the defining equation for frequency resolution.
where $\Delta w_{p}$ is the width of the frequency window corresponding to the frequency
slice function and $\eta _{p}$ is the frequency resolution. A time-frequency resolution
factor $K$, denoted as $K=w/\sigma $, is also needed to achieve a controlled sensitivity
to the time or frequency dimension of the signal. Therefore, Eq. (2) can be transformed into Eq. (3).
The time-resolved coefficient tends to vary inversely with the frequency resolution,
achieving a multi-resolution analysis effect. In general, $\eta _{p}$ {\textless}{\textless}
1. For the frequency-slicing function, it can be considered as a band-pass filter
that extends the time-frequency domain. Eq. (4) shows two common frequency-slicing functions.
The time-to-bandwidth ratio is a common performance indicator for frequency-slicing
functions. A smaller time-to-bandwidth ratio means the function shows better signal
aggregation performance in the time-frequency domain. Both functions presented in
Eq. (4) have a time-to-bandwidth ratio of 0.5. Theoretically, the frequency-slicing wavelet
transform can perform better in the time-to-frequency domain of the signal. On the
other hand, the practical application shows that the unoptimized frequency-slicing
wavelet transform will result in errors in the selection of monitoring frequencies
in the face of the noise interference inherent in mechanical vibration signals, affecting
the recognition results. The optimal selection of monitoring frequencies was achieved
by obtaining the Energy Ratio Based-FSWT (ERB-FSWT) by introducing the Energy Ratio
Factor (ERB). Fig. 1 shows a flow diagram of the ERB-FSWT.
As shown in Fig. 1, the energy occupation factor$\varepsilon $ was set to a small constant, and the
energy value and cut-off frequency were calculated using Eq. (5). In addition, when the monitoring range was set to $[0,\Delta F]$, the frequency-slicing
wavelet transform subdivides the signal frequencies and calculates the time-frequency
decomposition coefficients to obtain the time-frequency distribution of the signal
in that range. Eq. (5) is a mathematical expression for the energy occupation factor.
where $\Delta F$ is the cut-off frequency; $E_{\Delta F}$ is the energy value from
the starting frequency to the cut-off frequency; $E_{sum}$ is the sum of the energy
values in the monitored frequency range. The initial monitoring frequency was set
to zero as the rotating machinery often vibrates in the low-frequency band rather
than the high-frequency band. Fig. 1 presents the frequency-slicing wavelet transform optimized by the energy occupation
factor to show that a fault classifier will be used for fault identification in rotating
machinery vibration signals. A fault classifier is a deep learning model, usually
implemented utilizing a convolutional neural network (CNN). On the other hand, CNNs
rely only on several neurons of one for analysis when extracting the data features,
resulting in poor classification and recognition performance. In addition, CNNs are
weak in learning for affine transformations during convolutional operations, leading
to the loss of some signal features. Capsule networks are attracting attention because
they can convert single neurons into combined neurons, leaving the interconnection
between the data to be processed. In contrast to CNNs, capsule networks use so-called
``capsule vectors'' as the output values of the model, which allows the extraction
of a more diverse range of features from the signal data. The network also uses a
‘dynamic routing’ algorithm to form a capsule layer instead of the traditional pooling
layer used in CNNs, to reduce the problem of signal data loss. Fig. 2 presents the structure of the capsule network.
The starting capsule layer serves to form a capsule activity vector, as shown in Fig. 2. The digital capsule layer transforms the length information of the capsule vector
into probabilistic information using a transformation matrix. The final classification
capsule layer is the output of the classification and recognition of the signal. The
capsule network retains the same basic convolutional layers as the CNN network. Eq. (6) is a mathematical expression for the convolution operation.
where $x_{k'}^{\left(l\right)}$ is the output value of the $k'$ information feature
with the number of layers in the network as $l$. $k$ is the input feature index. $\ast
$is the discrete convolution calculation of the network layer $l$ with the network
layer $l-1$ on the $k$ feature when the convolution kernel is $k$. $w_{k}^{\left(l\right)}$is
the convolution kernel. $b_{k}^{l}$is the bias matrix of the network. $\phi $is the
activation function acting on the convolutional output. Common activation functions
are the hyperbolic tangent function, the sigmoid function, and the Rectified Linear
Unit (ReLU). Considering that the first two activation functions have the negative
phenomenon of gradient disappearance when the input value is small, the ReLU function
is used as the activation function of the model. Eq. (7) is a mathematical expression of the adjusted linear unit. Eq. (8) shows the expression of the output after activation using this function.
Eq. (7) shows that the adjusted linear unit is a segmented linear function. In Eq. (8), $x_{ijk}^{\left(l\right)}$ indicates the output element of the network layer $l$
corresponding to the $(i,j)$ output element when the number of features is$k$. The
dynamic routing algorithm in the capsule network consists of the following three processes.
Process 1 is multiplying the neurons with their corresponding weights and obtaining
the output prediction vector. Eq. (9) is the mathematical expression for process 1.
where $u_{i}$ is the first neuron of the previous layer$i$;$W_{ij}$ is the matrix
of neuron weights;$u_{j\left| i\right.}$ is the output prediction vector. Process
2 is used to obtain the total output vector by weighting and summing the output prediction
vectors obtained from process 1. Eq. (10) is the mathematical expression for process 2.
where $C_{ij}$ is the coupling coefficient, and$S_{j}$ is the total output vector.
Process 3 is a nonlinear compressive mapping transformation of this total output variable,
as expressed in Eq. (11).
where $j$ represents the output neuron number. This dynamic routing algorithm avoids
the problem of gradient disappearance during network training because the output vector
is calculated directly without backpropagation.
Fig. 1. ERB-FSWT Process Diagram.
Fig. 2. Structure diagram of the capsule network.
3.2 Dynamically Weighted Improved Fault Identification Model for Capsule Networks
based on the Channel Attention Mechanism (Ed note: An article is not needed as the
first word of a title.)
Deep learning fault recognition models, including capsule networks, do not alter their
distribution during sample training and sample testing. On the other hand, rotating
machinery is highly complex when it is put into operation. The network recognition
accuracy and generalization capability will be adversely affected if the unimproved
deep learning model is used without considering the variability of the actual operating
conditions. At the same time, multiple sensors are used to collect signal data for
the monitoring of vibration signals in rotating machinery. To achieve inter-fusion
of the multi-sensor collected signals, the study proposes a dynamic weighting method
based on the channel attention mechanism to improve the adaptability of the model
to complex signals. Table 1 lists the level types and fusion characteristics for the interfusion of information
numbers collected by multiple sensors.
The content of Table 1 shows that signal fusion is divided into three levels: low, medium, and high. The
input form of the signal fusion is divided into data, feature, and decision types.
Suppose the number of sensors used in the signal acquisition of rotating machinery
is$m$ and the vibration signal data of all sensors is $x_{i}$. The ERB-FSWT transform
then gives a time-frequency image of $y_{i}$. This time-frequency image is then stitched
together in the channel layer to obtain a feature map at $Y$. Eq. (12) expresses this feature map.
where $l$ and $w$ indicate the number of network layers. $C$ indicates the channel
layer splicing operation symbol. After the channel layer splicing process is complete,
the channel scaling operation is also required. Eq. (13) is the mathematical expression for this operation.
where $S^{k}$ indicates the output obtained from the channel scaling operation. $GAP(\cdot
)$ denotes the full network average pooling process. $k$ denotes the number of channels.
Eq. (14) expresses the channel decay and excitation process.
where $F_{1}^{j}$ and $F_{2}^{j}$ are the output of a fully connected layer of 1 and
a fully connected layer of 2, respectively. $W_{1}^{ij}$ is the weight matrix. $W_{2}^{ij}$$b_{1}^{i}$and$b_{2}^{i}$
are the bias matrices. The channel decay and excitation process are modeled at the
correlation level for all channels obtained by scaling, mainly in a series of two
fully connected layers. The result denotes the$F^{j}$$j$ output neuron of the fully
connected layer. Eq. (15) expresses the weight value generation.
where $w_{k}$ is the normalized weight value. Sigmod denotes the activation function,
which casts the output signal features of the fully connected layer of two between
(0,1) by mapping, resulting in a normalized weight value for all channels. Eq. (16) expresses the final dynamic weighting process that needs to be performed.
where $\hat{Y}$ denotes the feature map obtained using dynamic weighting. $scale$is
defined as a channel weighting operation.
Fig. 3 presents a flow diagram of the method for fault identification. Its identification
consists of the following parts. First, the rotating machine vibration data received
via the multi-sensor is pre-processed into raw data. The data is then transformed
into a time-frequency image using the ERB-FSWT transform. Subsequently, the time-frequency
images are divided into two types: training and test samples. The training samples
are then weight normalized via the channel layer, i.e., a weighted fusion operation.
The test samples are then fed into the trained model for fault signal identification,
and the result is used for analysis.
Fig. 3. Flow diagram of fault identification.
Table 1. Signal fusion level and its characteristic.
Level
|
Fusion type
|
Input form
|
Output form
|
Characteristic
|
Low
|
Data layer
|
Data
|
Data
|
Direct fusion of raw data
|
Secondary
|
Characteristic layer
|
Data
|
Features
|
Fusion of extracted attribute features before the decision level
|
Features
|
Features
|
High
|
Decision makers
|
Features
|
Policy decision
|
Fusion decision evidence
|
Policy decision
|
Policy decision
|