1. Introduction
With the rapid development in the fields of deep learning and computer vision, image
classification techniques have made significant achievements in data mining, natural
language processing, and other fields, and become the basis for core applications
such as image detection, segmentation, object tracking, and behavior analysis [1,
2]. However, in actual image data processing, traditional deep learning tasks require
the use of large-scale datasets, complex networks, powerful computing resources, and
a large amount of high-quality labeled data, resulting in high training costs and
complex workflows [3,
4]. In this context, Few-Shot Learning (FSL) has emerged with the goal of improving
the generalization ability of models with minimal labeled data, while addressing training
requirements in sparse data states. In the research of Vázquez CG scholars, it is
necessary to classify abnormal heart image data through a large amount of data labeling,
but high-quality image data faces difficulties. Therefore, in the study, FSL was used
to solve this problem, and the effects of label noise and self-learning label correction
on heart image classification were discussed separately. An optimized FSL model was
used for multi label learning, which improved accuracy by 5% in training on the MNIST
dataset [5]. However, the challenges of FSL on the domain of image classification are still great,
especially the accuracy and generalization ability of the model are more limited with
very few samples [6]. This study focuses on solving the application of FSL in image classification, and
conducts an in-depth study and improvement of the image classification algorithm based
on Image Deformation Meta Network (IDeMe Net) with small samples, hoping to improve
the generalization ability of the model, so that the model can achieve good classification
performance on smaller samples, and provide an opportunity to improve the image classification
algorithm of FSL to provide valuable guidance [7]. The main contributions of the research are twofold. The first point is that the
study provides a new solution for image classification problems in FSL, improving
the model's generalization ability. Secondly, the improvement strategies and dynamic
adaptive fusion strategies studied not only improve the performance of the model,
but also provide reference for research in the field of image classification.
2. Related Work
FSL is characterized by the use of a very small amount of labeled data to master the
problem solving ability rather than just solving the problem by learning the commonalities
between different tasks. Zhou X and other scholars proposed a few-sample learning
model based on Siamese Convolutional Neural Networks, which constructs a Siamese Convolutional
Neural Network coding network to measure the distance between inputs through an optimized
feature representation. Distance between samples. Experiments demonstrated that the
model significantly improved the false alarm rate and F1 score in intrusion signal
detection for security protection of industrial network systems [8]. Chen et al. proposed a novel hierarchical Graph Neural Network (GNN) for few-sample
learning, which consists of three parts: bottom-up inference, top-down inference,
and jump connectivity to achieve efficient learning of multilevel relationships. Experimental
results on benchmark datasets show that the proposed method significantly outperforms
other state-of-the-art GNN-based methods [9]. Zhang et al. proposed to transform the features extracted by a self-supervised feature
extractor into a Gaussian distribution in order to minimize mismatches in feature
distributions, thereby significantly improving the meta-training of graph networks.
Experimental results show that their method significantly outperforms the existing
optimal results in both fully supervised and semi-supervised settings, resulting in
a 12% improvement in baseline performance [10]. Song et al. address the problem of learning with fewer samples in computer vision
by performing simultaneous spatial attention in both the image and embedding space,
and introducing a Meta Learning (ML) machine to adaptively fuse the features of each
individual embedded local features. Experiments have shown that designing spatial
attention methods for few-sample learning is a very complex task in which the method
has shown effectiveness [11]. Li et al. used multilevel second-order attention representations and contextual
similarity to solve the problem of how to learn a robust representation and how to
select and label unlabeled known instances to build discriminative classifiers. Extensive
experiments on four commonly used benchmark test datasets show that this simple yet
effective method is comparable to the state-of-the-art methods available [12].
With the rapid development of deep learning, learning methods for small-sample classification
are getting richer and richer, which are mainly classified into three main categories:
data enhancement-based, ML-based, and metric-based learning. Researchers such as Wen
J have proposed a novel FSL method called multi-scale metric learning, which introduces
a feature pyramid structure for multi-scale feature embedding, and performs hierarchical
metric through multi-scale relational generative network Learning. Experimental results
on Mini-ImageNet and Tiered-ImageNet datasets demonstrate that the method achieves
superior performance on the few-sample learning problem [13]. Zhang et al. proposed an efficient method based on stacked sparse self-encoders
and Siamese networks for dark short-circuit faults in constant magnetic synchronous
motors. The method employs an encoder to extract sparse features in a limited number
of samples and uses a Siamese network to determine the similarity between the given
samples, which transforms the fault diagnosis problem into a classification problem
under few-sample learning [14]. Scholars such as Zhang have proposed a deep neural network based few-sample learning
method, which is based on the twin neural network for learning of identical and dissimilar
sample pairs by learning the sample pairs of the same or different categories to train.
Experimental results on a standard benchmark dataset for bearing fault diagnosis show
that this few-sample learning method is more effective in fault diagnosis with limited
data availability [15]. Das and Lee proposed a multilayer neural network architecture for few-sample image
identification of new categories. This multilayer neural network architecture encodes
transferable knowledge extracted from an annotated dataset of large base categories.
This less-sample learning system achieved competitive performance compared to previous
work [16]. Su Y and other scholars proposed a less-sample hierarchical categorization model
based on multi-granularity relational networks designed to take into account intra-class
similarity and inter-class relationships. Experimental results show that the model
outperforms several state-of-the-art models in both flat and hierarchical models.
For example, on the tiered ImageNet dataset, the accuracy of HMRN is improved by about
3.00% over the flat model [17].
Currently, FSL has been widely used in many fields, such as image recognition and
detection, speech recognition and translation, spectral recognition, text categorization,
medical detection, etc., and the classification accuracy is getting higher and higher.
However, FSL still has many unsolved problems hindering its development, such as how
to better utilize prior knowledge to further ensure the generalization ability of
the model. Research technology has two innovations. Firstly, a classification algorithm
based on image deformable networks was proposed, which improved the performance of
the model by modifying the auxiliary graphic selection method and modifying the image
fusion weights. Secondly, the study further introduced a dynamic adaptive fusion strategy
to avoid the problem of overly fine segmentation of the target area.
3. Small Sample Classification Algorithm and Improvement Based on Image Morphing Meta-network
The research mainly focuses on small sample image classification algorithms based
on image morphing networks, and joint training of image morphing networks and embedded
image classification networks is conducted in an end-to-end manner. In response to
the problem of insufficient sample data in small sample learning, an image morphing
network has been designed and improved, combining relationship networks with Euclidean
distance to generate fusion images with more significant target features. Then, on
the basis of improving the network, a dynamic adaptive fusion strategy is introduced
to improve the image segmentation method, thereby better realizing the recognition
and extraction of image features.
3.1. Small Sample-based Image Deformation Meta-network Modeling
IDeMe Net based small sample classification algorithm employs a simple and effective
ML method for end-to-end image classification training of image morphing networks
and embedded networks, which utilizes unsupervised training images to generate fused
images for the purpose of expanding the dataset. Compared to traditional graph neural
networks and convolutional neural networks, the advantage of deformable meta network
models is that they can effectively handle small sample learning problems and achieve
high-precision classification of target categories through image deformable networks.
In addition, the model also has good generalization ability and learning efficiency,
which can better adapt to the training environment. The ML based FSL can utilize the
prior knowledge to allow the model to gain some learning ability to learn quickly
at new tasks. The principle of the ML method is shown in Fig. 1.
Fig. 1. Schematic diagram of the principle of meta learning method.
According to Fig. 1, the ML method can be combined with other algorithm models during the input process
and can be used for small-scale learning. During learning, it can acquire certain
learning abilities through prior knowledge and improve its learning ability in new
task blocks. By learning meta information, we can improve parameter updates and accelerate
the learning progress of new task blocks. The small-sample image classification problem
is often described as the $N$-way $K$-shot problem, i.e., the support set contains
several categorization labels, and each label has one image. For the $N$-way $K$-shot
problem, $N$ classes that have not been trained are randomly selected from the dataset,
and $m$ images are randomly selected from each class to form the support set $S$.
The establishment method of query set $Q$ is the same, as shown in formula (1)
[18].
In Eq. (1), $m$ is the number of images, $Q$ is the number of images extracted from categories
when constructing the query set $Q$. The small sample-based image deformation meta-network
model consists of two main parts, the image deformation network and the embedded network,
which are designed to solve the FSL problem. The image deformation network is responsible
for realizing the enhancement of the dataset and is the data processing engine in
the model. The overall architecture of the image deformation meta-network is shown
in Fig. 2.
Fig. 2. The overall architecture of image morphing meta network model.
According to the image transformation element network structure in Fig. 2, the task is divided into a query machine and a support set, which are used as the
image support set and auxiliary image. Through the image transformation network, the
two are linearly fused with weight vectors to obtain a fused image. Then, the newly
generated fused image and the query set image are jointly input into the image classification
network for classification training. The input of the entire classification network
is the image and its real label, and the output is the predicted label corresponding
to each image. The predicted label is compared with the input real label to obtain
the final test result. The main idea of execution of image morphing network is to
generate fused images by linearly obfuscating the weight vectors with the support
set, auxiliary images. All the images in the support set will be arranged according
to their similarity and then the support images will be filtered. Thereafter, the
support set images and the auxiliary images are fed into two identical feature extraction
networks. Then, these two feature vectors are cascaded and input to the fully connected
layer to get the weight vector value and the output of the fully connected layer is
of length 9. The weight value of length 9 will be used as the weight matrix for image
fusion and the linear fusion of the images is shown in Eq. (2).
In Eq. (2), $I_{syn,q}$ denotes the image morphing sub-network, $w_q$ denotes the weight vector
value. $I_{probe,q}$ and $I_{gallery,q}$ denote the training and similarity sets,
respectively. In order to ensure that the image morphing network is able to create
visually or semantically different samples, the study designed an optimized prototype
loss function, which determines whether the morphing pictures fit into the expected
categories or not. After undergoing the fusion operation of the image morphing network,
a new support set $\tilde{S}$ will be obtained, which is an expanded version of the
specified samples about the support, auxiliary and query sets. Calculate the prototype
vector $p_\theta^c$ for each class in $\tilde{S}$ as shown in Eq. (3).
In Eq. (3), $p_\theta^c$ represents the prototype vector, $I_i$ represents the probability of
belonging to the category with the highest probability in category $c$. $Z = \sum_{(I_i,
y_i) \in \tilde{S}} [y_i = c]$ is a normalization factor to prevent the value from
going out of bounds. $f_{\theta_{emb}}$ denotes the feature extract on module. The
samples from the new support set $\tilde{S}$ are fed into the small sample classifier
along with the images from the query set, their distance will be calculated by the
metric function and the softmax classifier will give the output probability vector
P which determines the probability magnitude of the image categories. Given any image
$I_i \in Q$, the probability of it belonging to the highest probability category $c$
among the $N$ categories is calculated as shown in Eq. (4).
In Eq. (4), $p_\theta(y_i | I_i)$ is a small sample classifier that represents any image. In
the embedded network, the fused images along with the support set images form the
new expanded support set, which is generated after the image morphing network training.
The new support set images are fed into the embedded network along with the query
images and their feature representations are generated by the ResNet-18 residual network,
while the cross-entropy loss function is combined with the softmax classifier to derive
the corresponding classification accuracies and classification losses for the query
set images to make a category judgment. During this process, backpropagation is continuously
executed to adjust and update the entire network to reduce the experimental error.
The cross-entropy loss function is used to guide the training and its minimization
operation is shown in Eq. (5).
In Eq. (5), $G$ represents test set, respectively. The smaller the value of cross entropy, the
closer the probability distributions of the two functions are.
3.2. Small Sample Image Classification Algorithm Based on Improved Deformed Meta-network
A small sample classification algorithm based on image morphing meta-network where
image morphing network and embedded network for end-to-end image classification training
generated fused images using unsupervised approach. However, there is still room for
optimization of the network in terms of the way of selecting the auxiliary set of
images and generating the weight vectors. The improvement strategy proposed in the
study focuses on the image deformation network in the meta-network. The optimization
of the image deformation network has two main aspects: improving the selection method
of auxiliary images and improving the weight generation network. The framework of
the small-sample image classification network based on the improved deformation meta-network
is shown in Fig. 3.
Fig. 3. Framework of small sample image classful network based on improved deformable
element network.
The focus of improving classification accuracy is how to find auxiliary pictures with
higher similarity. For each small-sample classification task, we calculate the Euclidean
distances between the support set pictures and all the auxiliary pictures, sort these
distances according to their size, and then randomly select a few pictures from the
ones with smaller distances (ranked in the front) for fusion with the support set
to realize the expansion of the support set pictures. The original previous experiment
was to randomly select and merge among the top 30 ranked images. The Euclidean distance
$D_O(x, y)$ is calculated as shown in Eq. (6).
In Eq. (6),$x_i$ and$y_i$ denote two $n$-dimensional vectors. This method can effectively narrow
down the scope of image selection, and the images at the front of the ranking are
indeed highly similar, but because the quantification of the similarity of the images
is based on the traditional Euclidean distance method, there are still some limitations
on the similarity metric. In order to solve this problem, on the basis of the original
metric learning using Euclidean distance alone, the study proposes a method that combines
the relational network and Euclidean distance, which are added together to the process
of assisting the selection of pictures. In the relational network, the relational
module is categorized into embedding module and relevance module. The embedding module
is used to advance and splice the image feature information in the dataset, and then
these feature information is used as the input of the correlation module. The output
of the module is then the relationship score between the images in the training set
and the images in the query set, and the classification of the images is accomplished
based on the relationship score. The sample classification probability $r_{ij}$ is
calculated as shown in Eq. (7).
In Eq. (7),$g_\theta$ denotes the relational network, which is used to determine whether there
is correlation between the samples, and$C$ denotes the splicing operation. The loss
function of the relational network is shown in Eq. (8).
In Eq. (8), when the equation $y_i = y_j$ holds, the output is 1, and vice versa, the output
is 0. The structure of the relational network is schematically shown in Fig. 4.
Fig. 4. Schematic diagram of the structure of the relationship network.
The relational network designed for the study consists of two convolutional blocks,
two fully connected layers, and one activation layer, where the convolutional blocks
are in turn composed of a convolutional layer, a subsumption layer, an activation
function, and a maximum pooling layer. The properties of the relational network allow
the model to learn and understand the relationship between multiple samples at the
same time, which further improves the quality of the fused images and enhances the
classification accuracy. However, the $3 \times 3$ image division makes the RGB channels
in each region share the same weight value, which does not fully utilize the channel
information. Therefore, the study adds RGB channels to the image deformation weight
vector. When the input image is linearly fused with the weight vector, the weights
of the more important channels in the image are consciously increased. In this way,
the network will pay more attention to the useful feature information and ignore the
feature information that has less impact on the result, thus improving the image fusion
effect.
3.3. Improvement of Image Classification Algorithm Based on Dynamic Adaptive Fusion
Strategy
Combining the relationship network with Euclidean distance to participate in the selection
of auxiliary images is beneficial for selecting images with higher similarity to the
support set images. However, there are still some drawbacks to this approach in the
process of generating fused images. Therefore, the study further proposes an innovative
dynamic adaptive fusion strategy, which further modifies the image segmentation method,
accurately divides the required target features into one or several regions, better
utilizes the effective information of the image, and avoids excessive segmentation
of image features. This strategy enables the optimization of image deformation networks
by reforming the traditional auxiliary image selection method and correcting the weight
vectors. The dynamic adaptive fusion strategy utilizes the means of dynamic unequal
cuts to target the range of target features for more effective image segmentation.
Although this strategy has similarities with traditional localization, it is unique
in that it frames the main features while maintaining their integrity, thus avoiding
over-slicing. The structure of the adaptive network is shown in Fig. 5.
Fig. 5. Structure diagram of adaptive network.
The dynamic adaptive network can be considered as a neural network based segmentation
module which contains a feature extraction layer and a proportional position generation
layer. The inputs of this module are support set images and auxiliary images, and
the output is the segmentation position of the image. The module utilizes the self-learning
ability of the neural network to capture the global and local features of the image
to guide the image segmentation. The feature vectors are processed by the proportional
position generation layer to output a one-dimensional vector of length $4W$, which
is used as the position scale value for image segmentation. The position scale values
need to be converted to correspondences to be mapped to the corresponding points in
the image to realize image segmentation. The vector is first processed using $\tanh$
activation function. Then the converted function values are used to map to the regions
on both sides of the image center as shown in Eq. (9).
In Eq. (9), $(P_x, P_y)$ is the position value obtained by mapping, and$L$ is the width or height
of the image. $\tau$is the scaling parameter. Then, we merge the resulting dynamic
matrix with the weight vectors to generate a new weight matrix to adjust the fusion
region of the image. The structure of the improved weight network is shown in Fig. 6.
Fig. 6. Structure diagram of the improved weight network.
As can be seen in Fig. 6, after the division of the image, the size of each region is not consistent, carefully
protect the target region from damage, and consciously increase the weight value of
the target region as a way to improve the quality of the fused image for the subsequent
classification of small samples. This dynamic adaptive fusion strategy in the processing
of different feature samples, the input dynamic matrix will undergo various transformations,
although the image is still divided into nine regions, but the size of these regions
is not consistent. This strategy can effectively perform a rational linear fusion
of two images. The evaluation metrics for model performance include precision rate,
recall rate and accuracy rate. The calculation of precision rate is denoted in Eq.
(10).
In Eq. (10), $TP$ represents True Positive and $FP$ represents False Positive. Recall represents
the proportion of true positive samples that are predicted as positive samples to
the sum of correct and incorrect predictions of positive samples, which is calculated
as shown in Eq. (11).
In Eq. (11), $FN$ represents False Negative. The accuracy rate represents the proportion of correctly
classified images to the total sample as the accuracy rate, which is calculated as
shown in Eq. (12).
In Eq. (12), $TN$ represents True Negative. In small-sample image classification tasks, accuracy
is an important evaluation metric for measuring the performance of network models.
The study uses the image classification accuracy of the query set to determine the
performance of the small sample classification network. For the dataset $D$, the accuracy
rate is calculated as shown in Eq. (13).
In Eq. (13), $n_r$ means the amount of samples in the data set, and $G_r$ is the indicator function.
4. Small Sample Classification Algorithm Based on Image Deformation Meta-network and
Improvement Effect
To validate the performance of the two improvement strategies proposed by the study
for the FSL image classification algorithm, the study selected Prototypicial Network,
ML-Stochastic Gradient Descent (Meta-SGD), Matching Network, Relation Network, and
ML algorithm (Model-Agnostic Meta Learnings) as the comparison algorithms, and performed
classification validation experiments on Mini-ImageNet dataset.
4.1. Classification Effect of Image Deformation Meta-network Model Based on Small
Samples
In order to verify the application effect of the proposed model in small sample image
classification, experiments were conducted on the WINDOWS10 system, with NVIDIA RX3070
graphics card and INTEL i9 processor. The experimental analysis was completed on MATLAB.
In the experiment, ImageNet was selected as the experimental dataset, which contains
over 14 million image URLs manually annotated by ImageNet, as well as over 20000 categories.
Select miniImageNet for the experiment, which includes a dataset of 60000 $32 \times
32$ color images divided into 10 categories. The picture division way determines the
output length of the fully connected layer in the image deformation network, so the
accuracy of different division schemes such as $1 \times 1$, $3 \times 3$, $5 \times
5$ and $7 \times 7$ are compared. As shown in Fig. 7, with the increase of the amount of samples, the accuracy of classification shows
an upward trend whether it is Pixel level, $1 \times 1$, $3 \times 3$, $5 \times 5$
or $7 \times 7$ division. This indicates that increasing the number of samples can
effectively improve the performance of the model, i.e., increase the accuracy of classification.
However, the $5 \times 5$ and $7 \times 7$ division methods are prone to over-segmentation,
which will destroy the main semantics of the image, while the $3 \times 3$ image division
method not only ensures that the key region information is not destroyed, but also
increases the sample diversity, which ensures the image classification accuracy more.
Therefore, for the next experiments will directly use the $3 \times 3$ division scheme.
Fig. 7. Accuracy obtained by different image segmentation schemes.
The study conducted comparative experiments on Mini-ImageNet dataset using different
image classification algorithms as a comparison and the outcomes obtained are expressed
in Fig. 8. From Fig. 8, IDeMe Network achieves the best effectiveness in both 1-shot and 5-shot classification
tasks with an accuracy of 59.17% and 74.66%, respectively. This demonstrates the superiority
and stability of the IDeMe Network algorithm in dealing with image classification
problems. Considering the results of 1-shot and 5-shot together, IDeMe Network achieves
the best performance and highest accuracy on the image classification task.
Fig. 8. Classification result graph of image metamorphic network model based on small
samples.
4.2. Experimental Results of Small Sample Image Classification Algorithm Based on
Improved Deformed Meta-network
To verify the performance of the improvements proposed in the study, several common
classification models were selected for comparison experiments. In the improved image-based
deformation meta-network, the ResNet-18 framework is used, and there are 17 convolutional
layers and 1 fully connected layer in ResNet-18. The research is conducted based on
the Mini-ImageNet dataset, and mainly 5-way 1-shot and 5-way 5-shot FSL tasks are
performed. In the experiments, the Mini-ImageNet dataset images are split according
to the ratio of 64:16:20, and 64 categories are used as the training set, 16 categories
as the validation set, and the remaining 20 categories as the test set. The experimental
parameter settings for the two improved algorithms are shown in Table 1.
Table 1. Experimental parameter settings.
|
Number
|
Parameter
|
IDeMe-R Network
|
IDeMe-RD Network
|
|
1
|
Total number of iterations
|
100
|
50
|
|
2
|
Initial learning rate
|
0.001/0.01
|
0.001
|
|
3
|
Learning decay rate
|
0.5
|
0.5
|
|
4
|
Batch size
|
32
|
4
|
|
5
|
Class number of training set
|
64
|
64
|
|
6
|
Class number of validation set
|
16
|
16
|
|
7
|
Class number of test set
|
20
|
20
|
The performance comparison of several commonly used classification models on the Mini-ImageNet
dataset is shown in Fig. 9. From Fig. 9, for both 1-shot and 5-shot tasks, IDeMe-R Network outperforms all other algorithms,
with accuracies of 59.92% and 75.23%, respectively. This indicates that IDeMe-R Network
has a significant advantage in dealing with small-sample image classification problems.
In contrast, the performance of the original IDeMe Network is 59.17% and 74.66% in
the 1-shot and 5-shot cases, respectively, which is slightly inferior to that of the
IDeMe-R Network. This indicates that the improved deformed meta-network (IDeMe-R Network)
improves its performance, and verifies the effectiveness and feasibility of the algorithmic
improvement.
Fig. 9. Performance comparison of different classification algorithms on mini imagenet
datasets.
4.3. Improvement Effect of Image Classification Algorithm Based on Dynamic Adaptive
Fusion Strategy
5-way 1-shot experiments are conducted on several commonly used FSL models and further
improved classification algorithms based on the Mini-ImageNet dataset to compare and
verify the performance of the networks. The 5-way 1-shot experiments were not performed
due to the limitation of hardware facilities as the parameters of the network module
were further increased. The model training process of the network with the added dynamic
adaptive fusion strategy (IDeMe-RD Network) is denoted in Fig. 10. From the figure, both improved networks can converge faster and are more stable.
There is no parameter update in the testing phase, so the network will be more adaptive
to the training data.
Fig. 10. The model training process of the network with the added dynamic adaptive
fusion strategy (IDeMe-RD Network).
The results of the comparison experiments between several commonly used classification
models and the further improved IDeMe-RD Network studied are expressed in Table 2. From Table 2, IDeMe-RD Network with dynamic adaptive fusion strategy performs the best on the
5-way 1-shot small-sample image classification task with an accuracy of 60.15% $\pm$
0.28, which is significantly better than other strategies. This indicates that the
dynamic adaptive fusion strategy plays a positive role in improving the classification
accuracy. Our IDeMe-R Network and IDeMe Network both adopt ResNet-18 in their structure,
and the accuracies reach 59.83% $\pm$ 0.44 and 59.12% $\pm$ 0.86, respectively. This
reflects the superiority of ResNet-18 in image classification, and also indicates
that the IDeMe-R Network has a better performance compared to the IDeMe Network. Matching
Network has the weakest performance with an accuracy of only 43.58% $\pm$ 0.85. The
research finding indicated the validity of the study for further improvement of IDeMe-R
Network. At the same time, the differences in training error and time consumption
among different models were also compared. According to the results, the IDeMe-RD
network performed the best in 5-way 1-shot small sample data training with a training
error of 0.162, followed by the IDeMe-R network with a training error of 0.172. The
training errors for IDeMe network, MAML network, and Relationship network are 0.182,
0.191, and 0.213, respectively. At the same time, comparing the time consumption of
different models in actual image classification, IDeMe-RD network and IDeMe-R network
showed the best time consumption, with 15.150s and 16.150s respectively. The worst
performing network is the Prototype network, with a time consumption of 26.12. It
can be seen that the IDeMe RD network with adaptive fusion strategy performs the best
in classification accuracy, error performance, and training time.
Table 2. Improvement of image classification algorithm based on dynamic adaptive fusion
strategy.
|
Number
|
Module
|
Structure of feature extractor
|
5-way 1-shot (%)
|
Training error (%)
|
Training time (s)
|
|
1
|
Prototypical network
|
64(3)-64(3)-64(3)-64(3)
|
49.590$\pm$0.750
|
0.242
|
26.120
|
|
2
|
Matching Network
|
Inception Network
|
43.580$\pm$0.850
|
0.221
|
24.120
|
|
3
|
Relation Network
|
64(3)-64(3)-64(3)-64(3)
|
56.980$\pm$0.920
|
0.213
|
23.120
|
|
4
|
MAML
|
32(3)-32(3)-32(3)-32(3)
|
48.690$\pm$1.800
|
0.191
|
20.120
|
|
5
|
IDeMe Network
|
ResNet-18
|
59.120$\pm$0.860
|
0.182
|
19.240
|
|
6
|
IDeMe-R Network
|
ResNet-18
|
59.830$\pm$0.440
|
0.172
|
18.150
|
|
7
|
IDeMe-RD Network
|
ResNet-18
|
60.150$\pm$0.280
|
0.162
|
16.150
|
Finally, IDeMe Network, IDeMe-R Network, and IDeMe-RD Network were selected for P-R
(Precision Recall, P-R) curve analysis. The larger the area of the P-R curve, the
better the overall training effect of the model. Fig. 11(a) shows the test results on the 5-way1-shot dataset, with IDeMe Network, IDeMe-R Network,
and IDeMe-RD Network selected for experimental testing. From the P-R curve area, the
IDeMe-RD network has a larger curve area and better model performance. Compared to
IDeMe-R network and IDeMe network, the training performance of IDeMe-RD network has
been improved by 8.35% and 8.21% respectively. Fig. 11(b) shows the test results on the 5-way5 shot dataset, which are similar to the 5-way1
shot test results. The IDeMe-RD network has a larger area under the P-R curve and
better training performance. Compared with the IDeMe-R network and the IDeMe network,
the training performance of the IDeMe-RD network has improved by 7.32% and 9.21%,
respectively. It can be seen that the technology proposed by the research institute
has outstanding application effects in the field of image classification, meeting
the requirements of image data processing.
Fig. 11. Comparison of P-R curve performance under different data.