3.1.1 1DCNN for regression tasks
The three basic processes of a CNN are convolutional layers, pooling layers and fully
connected layers. The CNN was first proposed by Lecun et al. Convolutional and pooling
layers automatically obtain features, where fully connected layers serve as predictors
or classifiers.
Convolutional layer
In 1DCNN the operation is performed using single filter. Filters combine the inputs
to produce the related features. The process of 1D convolutional layer of single filter
is expressed as
Here ${oz}^k_l$ denotes the output feature map, ${af}_c$ is the activation function,
$*$ denotes the convolutional operation, $x\in r^{w\times l}$ is the input, and $b{\
and\ k}_l$ are the bias and kernel of the $l-th$ filter. $\{l=1$, $\dots$, $n\}$ is
the selected kernel size.
Pooling layer
Through max-pooling, significant features of pooling layers are maintained while the
number of features get decreased. The process of pooling operation is expressed as
In Eq. (2) $q$ and $r$ denote the row and column of features after pooling. Similarly, $l_p$
and $w_p$ denotes the length and width of filters in pooling layers.
Fully connected layer
Following feature extraction, the feature maps are compressed to one-dimension array
and fed into fully connected layers. In fully connected layers, the feedforward function
of a single neuron is expressed as
Here, ${af}_f$ is the activation function, ${we}_a$ is the weight and ${in}_a$ denotes
the input of the neuron, $b$ is the bias, and $y$ is the output get from 1DCNN.
3.1.2 Optimization with PSO
To improve the structure of 1DCNN PSO is applied, first, the optimization target and
fitness function are specified. It is possible to assess the particle score using
the fitness function. The particles modify their paths and positions based on the
best location of the group and they are most advantageous using
And
where $w$ is the weight,$\ t$ is the iteration index, ${dv}_i$ is the direction of
the $ith$ particle, ${cw}_1$ and ${cw}_2$ are the weights that indicate how much ${pl}_{pbest}$t
and ${pl}_{gbest}$ impact the optimization, and ${pl}_i\left(t\right)$ is the location
of the $ith$ particle at the $t-th$ iteration. If the iteration reaches the predefined
maximum, the ${pl}_{gbest}$fitness does not change, the optimization finally achieves
its end and ${pl}_{gbest}$ is the optimal result.
3.1.3 2DCNN for classification tasks
Classification tasks like identifying bearing problems and tool wear are handled by
the 2DCNN. 2DCNN processes STFT time-frequency spectrum images to identify spatial
patterns. The STFT transforms the signal into time frequency domain.
Initially, signals are split into short-time segments using STFT, and then DFT is
used to calculate the frequency distributions of the segments. Lastly, layering the
frequency spectra of segments obtains the time-frequency signals. The process of STFT
is expressed as
Here $a_x$ denotes the discrete signal with size $N$, $\omega $ is frequency, $n$
is the index of data points in $a_x$, $w$ is the discrete window function, $m_i$
is the discrete index in window.
3.1.4 Integration and training using DNN
In this section, the DNN is used to combine both the features extracted form 1DCNN
and 2DCNN. By combining supervised and unsupervised learning techniques, the DNN is
trained to reduce manual feature selection and improve overall mechanical fault diagnosis
performance.
The DNN done unsupervised pre-training by using a collection of DAE. The input data
is first encoded into a lower-dimensional space by each DAE layer, after that it get
reconstruct. The encoding function transforms the input data into encoded vector.
The encoding function is expressed as
Here ${as}_f$ is the activation function, $w\ and\ b$ are the parameters of the encoding
network and ${eh}_m$ is the encoded vector. Where the encoding and parameters and
activation function are considered as important.
Similarly, the decoding function reconstructs the input data from the encoded vector
by using the activation function and parameters of the decoding network. This can
be expressed as
In Eq. (8), ${as}_g$ is the activation function of the decoding network, $w$ and $d_c$ are the
parameters of the decoding network and ${\widehat{rx}}_m$ is constructed input. The
goal of this process is to reduce the reconstructed error between the original input
and the reconstructed output, and make sure abut the encoder vector maintains the
same information of the input data. The reconstruction error of the process is defined
as
To improve the performance of feature extraction, DAE is used to introduce noise to
the input samples. This can be expressed as
By using this method, the autoencoder is able to extract noise-resistant features,
which improves the model's ability to deal with data changes.
Reducing the reconstruction error between the noisy input and the initial clean input
is the DAE's training goal. The training objective of the DAE is defined as
A group of DAE is used in a layer-by-layer pre-training method to train the DNN. The
input layer of the DNN is trained first, followed by the deeper layers in a sequential
manner. The input data is encoded into a lower-dimensional representation at each
layer.
The DNN is able to identify difficult patterns and features in the data because of
its effective approach. After trained all layers, the final encoded representation
serves as the model's input for the further stage. The final encoding is denoted as
The DNN goes through a small supervised learning adjustment after the completion of
unsupervised pre-training. This stage involves fine-tuning the DNN parameters to increase
the model's performance for particular tasks. Based on the encoded representations
from the earlier layers, the DNN generates its output. This expressed as
The optimization objective of this stage is to reduce the loss between the predicted
output and true target values for improving the model's accuracy fault diagnosis.
Overall, the proposed HDCNN combines 1DCNN and 2DCNN features by using trained DNN,
For regression tasks, 1DCNN processes raw vibration data which is performed using
convolutional operations to extract relevant features. Similarly for classification
tasks 2DCNN handles time frequency images obtained through STFT. The features of both
1D and 2D are combined and transform to the DNN for final classification and regression
tasks, the process of feature integration and training using DNN is expressed as
Finally, the integration of 1DCNN and 2DCNN features are combined using Eqs. (16) and (17) by using trained DNN, this denotes the final output of the DNN for fault diagnosis
which can be expressed as
In Eq. (18), ${eh}^{\left(N-1\right)}_m$ is the input of the $N-th$ layer, ${eh}^N_m$ is the
output of the N$-th$ layer and $ry_m$ is the final result after applying the output
layer transformation. Fig. 2 presents the whole process structure of HDCNN.
Fig. 2. Proposed HCNN combined architecture.