Mobile QR Code QR CODE

2025

Reject Ratio

81.5%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 15, No. 01, p.96-107

ISSN (online) :

2287-5255

Received : 19 June 2024Revised : 13 August 2024Accepted : 8 October 2024

DOI :

https://doi.org/10.5573/IEIESPC.2026.15.1.96

Regular Paper

Small Sample Learning Image Classification Algorithm Based on Improved Image Deformation Network

Lei Li^1,^* Yuemei Ren¹

(Henan Polytechnic Institute, Nanyang 473000, China lilei_hnpi@163.com, enym_hnpi@163.com)

^* Corresponding Author: Lei Li, lilei_hnpi@163.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Aiming at the image classification problem in small-sample learning, we study the classification algorithm based on the image deformation network, and propose the improvement strategy of modifying the way of selecting auxiliary graphics and modifying the value of image fusion weights. The strategy combines the relational network with the Euclidean distance and includes the RGB channel parameters in the training range as well. On this basis, the dynamic adaptive fusion strategy is further introduced to avoid the target region being segmented too finely. The experimental results show that the image deformation meta-network achieves the best performance in both 1-shot and 5-shot classification tasks, with accuracies of 59.17% and 74.66%, respectively. The improved relational network-deformation meta-network shows improved performance by achieving 59.92% and 75.23% accuracy in 1-shot and 5-shot tasks, respectively. Further the network with dynamic adaptive fusion strategy achieves an accuracy of 60.15% $\pm$ 0.28 on the 5-way 1-shot small sample image classification task, which is significantly better than the other strategies. The experimental results show that the improvement of small-sample image classification algorithm based on image deformation meta-network is effective and can significantly improve the classification accuracy, especially the dynamic adaptive fusion strategy, which plays a positive role in improving the classification accuracy.

Keywords

Dynamic adaptive, Image deformation networks, Meta-learning, Relational networks, Small sample image classification

1. Introduction

With the rapid development in the fields of deep learning and computer vision, image classification techniques have made significant achievements in data mining, natural language processing, and other fields, and become the basis for core applications such as image detection, segmentation, object tracking, and behavior analysis ^[1, ^2]. However, in actual image data processing, traditional deep learning tasks require the use of large-scale datasets, complex networks, powerful computing resources, and a large amount of high-quality labeled data, resulting in high training costs and complex workflows ^[3, ^4]. In this context, Few-Shot Learning (FSL) has emerged with the goal of improving the generalization ability of models with minimal labeled data, while addressing training requirements in sparse data states. In the research of Vázquez CG scholars, it is necessary to classify abnormal heart image data through a large amount of data labeling, but high-quality image data faces difficulties. Therefore, in the study, FSL was used to solve this problem, and the effects of label noise and self-learning label correction on heart image classification were discussed separately. An optimized FSL model was used for multi label learning, which improved accuracy by 5% in training on the MNIST dataset ^[5]. However, the challenges of FSL on the domain of image classification are still great, especially the accuracy and generalization ability of the model are more limited with very few samples ^[6]. This study focuses on solving the application of FSL in image classification, and conducts an in-depth study and improvement of the image classification algorithm based on Image Deformation Meta Network (IDeMe Net) with small samples, hoping to improve the generalization ability of the model, so that the model can achieve good classification performance on smaller samples, and provide an opportunity to improve the image classification algorithm of FSL to provide valuable guidance ^[7]. The main contributions of the research are twofold. The first point is that the study provides a new solution for image classification problems in FSL, improving the model's generalization ability. Secondly, the improvement strategies and dynamic adaptive fusion strategies studied not only improve the performance of the model, but also provide reference for research in the field of image classification.

2. Related Work

FSL is characterized by the use of a very small amount of labeled data to master the problem solving ability rather than just solving the problem by learning the commonalities between different tasks. Zhou X and other scholars proposed a few-sample learning model based on Siamese Convolutional Neural Networks, which constructs a Siamese Convolutional Neural Network coding network to measure the distance between inputs through an optimized feature representation. Distance between samples. Experiments demonstrated that the model significantly improved the false alarm rate and F1 score in intrusion signal detection for security protection of industrial network systems ^[8]. Chen et al. proposed a novel hierarchical Graph Neural Network (GNN) for few-sample learning, which consists of three parts: bottom-up inference, top-down inference, and jump connectivity to achieve efficient learning of multilevel relationships. Experimental results on benchmark datasets show that the proposed method significantly outperforms other state-of-the-art GNN-based methods ^[9]. Zhang et al. proposed to transform the features extracted by a self-supervised feature extractor into a Gaussian distribution in order to minimize mismatches in feature distributions, thereby significantly improving the meta-training of graph networks. Experimental results show that their method significantly outperforms the existing optimal results in both fully supervised and semi-supervised settings, resulting in a 12% improvement in baseline performance ^[10]. Song et al. address the problem of learning with fewer samples in computer vision by performing simultaneous spatial attention in both the image and embedding space, and introducing a Meta Learning (ML) machine to adaptively fuse the features of each individual embedded local features. Experiments have shown that designing spatial attention methods for few-sample learning is a very complex task in which the method has shown effectiveness ^[11]. Li et al. used multilevel second-order attention representations and contextual similarity to solve the problem of how to learn a robust representation and how to select and label unlabeled known instances to build discriminative classifiers. Extensive experiments on four commonly used benchmark test datasets show that this simple yet effective method is comparable to the state-of-the-art methods available ^[12].

With the rapid development of deep learning, learning methods for small-sample classification are getting richer and richer, which are mainly classified into three main categories: data enhancement-based, ML-based, and metric-based learning. Researchers such as Wen J have proposed a novel FSL method called multi-scale metric learning, which introduces a feature pyramid structure for multi-scale feature embedding, and performs hierarchical metric through multi-scale relational generative network Learning. Experimental results on Mini-ImageNet and Tiered-ImageNet datasets demonstrate that the method achieves superior performance on the few-sample learning problem ^[13]. Zhang et al. proposed an efficient method based on stacked sparse self-encoders and Siamese networks for dark short-circuit faults in constant magnetic synchronous motors. The method employs an encoder to extract sparse features in a limited number of samples and uses a Siamese network to determine the similarity between the given samples, which transforms the fault diagnosis problem into a classification problem under few-sample learning ^[14]. Scholars such as Zhang have proposed a deep neural network based few-sample learning method, which is based on the twin neural network for learning of identical and dissimilar sample pairs by learning the sample pairs of the same or different categories to train. Experimental results on a standard benchmark dataset for bearing fault diagnosis show that this few-sample learning method is more effective in fault diagnosis with limited data availability ^[15]. Das and Lee proposed a multilayer neural network architecture for few-sample image identification of new categories. This multilayer neural network architecture encodes transferable knowledge extracted from an annotated dataset of large base categories. This less-sample learning system achieved competitive performance compared to previous work ^[16]. Su Y and other scholars proposed a less-sample hierarchical categorization model based on multi-granularity relational networks designed to take into account intra-class similarity and inter-class relationships. Experimental results show that the model outperforms several state-of-the-art models in both flat and hierarchical models. For example, on the tiered ImageNet dataset, the accuracy of HMRN is improved by about 3.00% over the flat model ^[17].

Currently, FSL has been widely used in many fields, such as image recognition and detection, speech recognition and translation, spectral recognition, text categorization, medical detection, etc., and the classification accuracy is getting higher and higher. However, FSL still has many unsolved problems hindering its development, such as how to better utilize prior knowledge to further ensure the generalization ability of the model. Research technology has two innovations. Firstly, a classification algorithm based on image deformable networks was proposed, which improved the performance of the model by modifying the auxiliary graphic selection method and modifying the image fusion weights. Secondly, the study further introduced a dynamic adaptive fusion strategy to avoid the problem of overly fine segmentation of the target area.

3. Small Sample Classification Algorithm and Improvement Based on Image Morphing Meta-network

The research mainly focuses on small sample image classification algorithms based on image morphing networks, and joint training of image morphing networks and embedded image classification networks is conducted in an end-to-end manner. In response to the problem of insufficient sample data in small sample learning, an image morphing network has been designed and improved, combining relationship networks with Euclidean distance to generate fusion images with more significant target features. Then, on the basis of improving the network, a dynamic adaptive fusion strategy is introduced to improve the image segmentation method, thereby better realizing the recognition and extraction of image features.

3.1. Small Sample-based Image Deformation Meta-network Modeling

IDeMe Net based small sample classification algorithm employs a simple and effective ML method for end-to-end image classification training of image morphing networks and embedded networks, which utilizes unsupervised training images to generate fused images for the purpose of expanding the dataset. Compared to traditional graph neural networks and convolutional neural networks, the advantage of deformable meta network models is that they can effectively handle small sample learning problems and achieve high-precision classification of target categories through image deformable networks. In addition, the model also has good generalization ability and learning efficiency, which can better adapt to the training environment. The ML based FSL can utilize the prior knowledge to allow the model to gain some learning ability to learn quickly at new tasks. The principle of the ML method is shown in Fig. 1.

Fig. 1. Schematic diagram of the principle of meta learning method.

According to Fig. 1, the ML method can be combined with other algorithm models during the input process and can be used for small-scale learning. During learning, it can acquire certain learning abilities through prior knowledge and improve its learning ability in new task blocks. By learning meta information, we can improve parameter updates and accelerate the learning progress of new task blocks. The small-sample image classification problem is often described as the $N$-way $K$-shot problem, i.e., the support set contains several categorization labels, and each label has one image. For the $N$-way $K$-shot problem, $N$ classes that have not been trained are randomly selected from the dataset, and $m$ images are randomly selected from each class to form the support set $S$. The establishment method of query set $Q$ is the same, as shown in formula (1) ^[18].

(1)

$ \begin{cases} |S| = N \times m, \\ |Q| = N \times q. \end{cases} $

In Eq. (1), $m$ is the number of images, $Q$ is the number of images extracted from categories when constructing the query set $Q$. The small sample-based image deformation meta-network model consists of two main parts, the image deformation network and the embedded network, which are designed to solve the FSL problem. The image deformation network is responsible for realizing the enhancement of the dataset and is the data processing engine in the model. The overall architecture of the image deformation meta-network is shown in Fig. 2.

Fig. 2. The overall architecture of image morphing meta network model.

According to the image transformation element network structure in Fig. 2, the task is divided into a query machine and a support set, which are used as the image support set and auxiliary image. Through the image transformation network, the two are linearly fused with weight vectors to obtain a fused image. Then, the newly generated fused image and the query set image are jointly input into the image classification network for classification training. The input of the entire classification network is the image and its real label, and the output is the predicted label corresponding to each image. The predicted label is compared with the input real label to obtain the final test result. The main idea of execution of image morphing network is to generate fused images by linearly obfuscating the weight vectors with the support set, auxiliary images. All the images in the support set will be arranged according to their similarity and then the support images will be filtered. Thereafter, the support set images and the auxiliary images are fed into two identical feature extraction networks. Then, these two feature vectors are cascaded and input to the fully connected layer to get the weight vector value and the output of the fully connected layer is of length 9. The weight value of length 9 will be used as the weight matrix for image fusion and the linear fusion of the images is shown in Eq. (2).

(2)

$ I_{syn,q} = w_q I_{probe,q} + (1 - w_q) I_{gallery,q}. $

In Eq. (2), $I_{syn,q}$ denotes the image morphing sub-network, $w_q$ denotes the weight vector value. $I_{probe,q}$ and $I_{gallery,q}$ denote the training and similarity sets, respectively. In order to ensure that the image morphing network is able to create visually or semantically different samples, the study designed an optimized prototype loss function, which determines whether the morphing pictures fit into the expected categories or not. After undergoing the fusion operation of the image morphing network, a new support set $\tilde{S}$ will be obtained, which is an expanded version of the specified samples about the support, auxiliary and query sets. Calculate the prototype vector $p_\theta^c$ for each class in $\tilde{S}$ as shown in Eq. (3).

(3)

$ p_\theta^c = \frac{1}{Z} \sum_{(I_i, y_i) \in \tilde{S}} f_{\theta_{emb}}(I_i) \cdot [y_i = c]. $

In Eq. (3), $p_\theta^c$ represents the prototype vector, $I_i$ represents the probability of belonging to the category with the highest probability in category $c$. $Z = \sum_{(I_i, y_i) \in \tilde{S}} [y_i = c]$ is a normalization factor to prevent the value from going out of bounds. $f_{\theta_{emb}}$ denotes the feature extract on module. The samples from the new support set $\tilde{S}$ are fed into the small sample classifier along with the images from the query set, their distance will be calculated by the metric function and the softmax classifier will give the output probability vector P which determines the probability magnitude of the image categories. Given any image $I_i \in Q$, the probability of it belonging to the highest probability category $c$ among the $N$ categories is calculated as shown in Eq. (4).

(4)

$ p_\theta(y_i = c | I_i) = \frac{\exp(-[f_{\theta_{emb}}(I_i), p_\theta^c])}{\sum_{j=1}^N \exp(-[f_{\theta_{emb}}(I_i), p_\theta^j])}. $

In Eq. (4), $p_\theta(y_i | I_i)$ is a small sample classifier that represents any image. In the embedded network, the fused images along with the support set images form the new expanded support set, which is generated after the image morphing network training. The new support set images are fed into the embedded network along with the query images and their feature representations are generated by the ResNet-18 residual network, while the cross-entropy loss function is combined with the softmax classifier to derive the corresponding classification accuracies and classification losses for the query set images to make a category judgment. During this process, backpropagation is continuously executed to adjust and update the entire network to reduce the experimental error. The cross-entropy loss function is used to guide the training and its minimization operation is shown in Eq. (5).

(5)

$ \min_\theta E_{L \sim D_{base}} E_{S, G, Q \sim L} \left[ \sum_{(I_i, y_i) \in Q} -\log P_\theta[y_i | I_i] \right]. $

In Eq. (5), $G$ represents test set, respectively. The smaller the value of cross entropy, the closer the probability distributions of the two functions are.

3.2. Small Sample Image Classification Algorithm Based on Improved Deformed Meta-network

A small sample classification algorithm based on image morphing meta-network where image morphing network and embedded network for end-to-end image classification training generated fused images using unsupervised approach. However, there is still room for optimization of the network in terms of the way of selecting the auxiliary set of images and generating the weight vectors. The improvement strategy proposed in the study focuses on the image deformation network in the meta-network. The optimization of the image deformation network has two main aspects: improving the selection method of auxiliary images and improving the weight generation network. The framework of the small-sample image classification network based on the improved deformation meta-network is shown in Fig. 3.

Fig. 3. Framework of small sample image classful network based on improved deformable element network.

The focus of improving classification accuracy is how to find auxiliary pictures with higher similarity. For each small-sample classification task, we calculate the Euclidean distances between the support set pictures and all the auxiliary pictures, sort these distances according to their size, and then randomly select a few pictures from the ones with smaller distances (ranked in the front) for fusion with the support set to realize the expansion of the support set pictures. The original previous experiment was to randomly select and merge among the top 30 ranked images. The Euclidean distance $D_O(x, y)$ is calculated as shown in Eq. (6).

(6)

$ D_O(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}. $

In Eq. (6),$x_i$ and$y_i$ denote two $n$-dimensional vectors. This method can effectively narrow down the scope of image selection, and the images at the front of the ranking are indeed highly similar, but because the quantification of the similarity of the images is based on the traditional Euclidean distance method, there are still some limitations on the similarity metric. In order to solve this problem, on the basis of the original metric learning using Euclidean distance alone, the study proposes a method that combines the relational network and Euclidean distance, which are added together to the process of assisting the selection of pictures. In the relational network, the relational module is categorized into embedding module and relevance module. The embedding module is used to advance and splice the image feature information in the dataset, and then these feature information is used as the input of the correlation module. The output of the module is then the relationship score between the images in the training set and the images in the query set, and the classification of the images is accomplished based on the relationship score. The sample classification probability $r_{ij}$ is calculated as shown in Eq. (7).

(7)

$ r_{ij} = g_\theta \{C[f_\theta(x_i) f_\theta(x_j)]\}, \ i = 1,\ 2,\ 3,\ 4,\ 5. $

In Eq. (7),$g_\theta$ denotes the relational network, which is used to determine whether there is correlation between the samples, and$C$ denotes the splicing operation. The loss function of the relational network is shown in Eq. (8).

(8)

$ Loss = \sum_{i=1}^m \sum_{j=1}^n [r_{ij} - 1(y_i = y_j)]^2. $

In Eq. (8), when the equation $y_i = y_j$ holds, the output is 1, and vice versa, the output is 0. The structure of the relational network is schematically shown in Fig. 4.

Fig. 4. Schematic diagram of the structure of the relationship network.

The relational network designed for the study consists of two convolutional blocks, two fully connected layers, and one activation layer, where the convolutional blocks are in turn composed of a convolutional layer, a subsumption layer, an activation function, and a maximum pooling layer. The properties of the relational network allow the model to learn and understand the relationship between multiple samples at the same time, which further improves the quality of the fused images and enhances the classification accuracy. However, the $3 \times 3$ image division makes the RGB channels in each region share the same weight value, which does not fully utilize the channel information. Therefore, the study adds RGB channels to the image deformation weight vector. When the input image is linearly fused with the weight vector, the weights of the more important channels in the image are consciously increased. In this way, the network will pay more attention to the useful feature information and ignore the feature information that has less impact on the result, thus improving the image fusion effect.

3.3. Improvement of Image Classification Algorithm Based on Dynamic Adaptive Fusion Strategy

Combining the relationship network with Euclidean distance to participate in the selection of auxiliary images is beneficial for selecting images with higher similarity to the support set images. However, there are still some drawbacks to this approach in the process of generating fused images. Therefore, the study further proposes an innovative dynamic adaptive fusion strategy, which further modifies the image segmentation method, accurately divides the required target features into one or several regions, better utilizes the effective information of the image, and avoids excessive segmentation of image features. This strategy enables the optimization of image deformation networks by reforming the traditional auxiliary image selection method and correcting the weight vectors. The dynamic adaptive fusion strategy utilizes the means of dynamic unequal cuts to target the range of target features for more effective image segmentation. Although this strategy has similarities with traditional localization, it is unique in that it frames the main features while maintaining their integrity, thus avoiding over-slicing. The structure of the adaptive network is shown in Fig. 5.

Fig. 5. Structure diagram of adaptive network.

The dynamic adaptive network can be considered as a neural network based segmentation module which contains a feature extraction layer and a proportional position generation layer. The inputs of this module are support set images and auxiliary images, and the output is the segmentation position of the image. The module utilizes the self-learning ability of the neural network to capture the global and local features of the image to guide the image segmentation. The feature vectors are processed by the proportional position generation layer to output a one-dimensional vector of length $4W$, which is used as the position scale value for image segmentation. The position scale values need to be converted to correspondences to be mapped to the corresponding points in the image to realize image segmentation. The vector is first processed using $\tanh$ activation function. Then the converted function values are used to map to the regions on both sides of the image center as shown in Eq. (9).

(9)

$ (P_x, P_y) = \tau * \tanh(W) * L/2 + L/2. $

In Eq. (9), $(P_x, P_y)$ is the position value obtained by mapping, and$L$ is the width or height of the image. $\tau$is the scaling parameter. Then, we merge the resulting dynamic matrix with the weight vectors to generate a new weight matrix to adjust the fusion region of the image. The structure of the improved weight network is shown in Fig. 6.

Fig. 6. Structure diagram of the improved weight network.

As can be seen in Fig. 6, after the division of the image, the size of each region is not consistent, carefully protect the target region from damage, and consciously increase the weight value of the target region as a way to improve the quality of the fused image for the subsequent classification of small samples. This dynamic adaptive fusion strategy in the processing of different feature samples, the input dynamic matrix will undergo various transformations, although the image is still divided into nine regions, but the size of these regions is not consistent. This strategy can effectively perform a rational linear fusion of two images. The evaluation metrics for model performance include precision rate, recall rate and accuracy rate. The calculation of precision rate is denoted in Eq. (10).

(10)

$ P = TP / (TP + FP) \times 100\%. $

In Eq. (10), $TP$ represents True Positive and $FP$ represents False Positive. Recall represents the proportion of true positive samples that are predicted as positive samples to the sum of correct and incorrect predictions of positive samples, which is calculated as shown in Eq. (11).

(11)

$ R = TP / (TP + FN) \times 100\%. $

In Eq. (11), $FN$ represents False Negative. The accuracy rate represents the proportion of correctly classified images to the total sample as the accuracy rate, which is calculated as shown in Eq. (12).

(12)

$ Acc = (TP + TN) / (TP + TN + FP + FN) \times 100\%. $

In Eq. (12), $TN$ represents True Negative. In small-sample image classification tasks, accuracy is an important evaluation metric for measuring the performance of network models. The study uses the image classification accuracy of the query set to determine the performance of the small sample classification network. For the dataset $D$, the accuracy rate is calculated as shown in Eq. (13).

(13)

$ Acc(f; D) = \frac{1}{n_r} \sum_{i=1}^{n_r} G_r(f(x_i) = y_i). $

In Eq. (13), $n_r$ means the amount of samples in the data set, and $G_r$ is the indicator function.

4. Small Sample Classification Algorithm Based on Image Deformation Meta-network and Improvement Effect

To validate the performance of the two improvement strategies proposed by the study for the FSL image classification algorithm, the study selected Prototypicial Network, ML-Stochastic Gradient Descent (Meta-SGD), Matching Network, Relation Network, and ML algorithm (Model-Agnostic Meta Learnings) as the comparison algorithms, and performed classification validation experiments on Mini-ImageNet dataset.

4.1. Classification Effect of Image Deformation Meta-network Model Based on Small Samples

In order to verify the application effect of the proposed model in small sample image classification, experiments were conducted on the WINDOWS10 system, with NVIDIA RX3070 graphics card and INTEL i9 processor. The experimental analysis was completed on MATLAB. In the experiment, ImageNet was selected as the experimental dataset, which contains over 14 million image URLs manually annotated by ImageNet, as well as over 20000 categories. Select miniImageNet for the experiment, which includes a dataset of 60000 $32 \times 32$ color images divided into 10 categories. The picture division way determines the output length of the fully connected layer in the image deformation network, so the accuracy of different division schemes such as $1 \times 1$, $3 \times 3$, $5 \times 5$ and $7 \times 7$ are compared. As shown in Fig. 7, with the increase of the amount of samples, the accuracy of classification shows an upward trend whether it is Pixel level, $1 \times 1$, $3 \times 3$, $5 \times 5$ or $7 \times 7$ division. This indicates that increasing the number of samples can effectively improve the performance of the model, i.e., increase the accuracy of classification. However, the $5 \times 5$ and $7 \times 7$ division methods are prone to over-segmentation, which will destroy the main semantics of the image, while the $3 \times 3$ image division method not only ensures that the key region information is not destroyed, but also increases the sample diversity, which ensures the image classification accuracy more. Therefore, for the next experiments will directly use the $3 \times 3$ division scheme.

Fig. 7. Accuracy obtained by different image segmentation schemes.

The study conducted comparative experiments on Mini-ImageNet dataset using different image classification algorithms as a comparison and the outcomes obtained are expressed in Fig. 8. From Fig. 8, IDeMe Network achieves the best effectiveness in both 1-shot and 5-shot classification tasks with an accuracy of 59.17% and 74.66%, respectively. This demonstrates the superiority and stability of the IDeMe Network algorithm in dealing with image classification problems. Considering the results of 1-shot and 5-shot together, IDeMe Network achieves the best performance and highest accuracy on the image classification task.

Fig. 8. Classification result graph of image metamorphic network model based on small samples.

4.2. Experimental Results of Small Sample Image Classification Algorithm Based on Improved Deformed Meta-network

To verify the performance of the improvements proposed in the study, several common classification models were selected for comparison experiments. In the improved image-based deformation meta-network, the ResNet-18 framework is used, and there are 17 convolutional layers and 1 fully connected layer in ResNet-18. The research is conducted based on the Mini-ImageNet dataset, and mainly 5-way 1-shot and 5-way 5-shot FSL tasks are performed. In the experiments, the Mini-ImageNet dataset images are split according to the ratio of 64:16:20, and 64 categories are used as the training set, 16 categories as the validation set, and the remaining 20 categories as the test set. The experimental parameter settings for the two improved algorithms are shown in Table 1.

Table 1. Experimental parameter settings.

Number	Parameter	IDeMe-R Network	IDeMe-RD Network
1	Total number of iterations	100	50
2	Initial learning rate	0.001/0.01	0.001
3	Learning decay rate	0.5	0.5
4	Batch size	32	4
5	Class number of training set	64	64
6	Class number of validation set	16	16
7	Class number of test set	20	20

The performance comparison of several commonly used classification models on the Mini-ImageNet dataset is shown in Fig. 9. From Fig. 9, for both 1-shot and 5-shot tasks, IDeMe-R Network outperforms all other algorithms, with accuracies of 59.92% and 75.23%, respectively. This indicates that IDeMe-R Network has a significant advantage in dealing with small-sample image classification problems. In contrast, the performance of the original IDeMe Network is 59.17% and 74.66% in the 1-shot and 5-shot cases, respectively, which is slightly inferior to that of the IDeMe-R Network. This indicates that the improved deformed meta-network (IDeMe-R Network) improves its performance, and verifies the effectiveness and feasibility of the algorithmic improvement.

Fig. 9. Performance comparison of different classification algorithms on mini imagenet datasets.

4.3. Improvement Effect of Image Classification Algorithm Based on Dynamic Adaptive Fusion Strategy

5-way 1-shot experiments are conducted on several commonly used FSL models and further improved classification algorithms based on the Mini-ImageNet dataset to compare and verify the performance of the networks. The 5-way 1-shot experiments were not performed due to the limitation of hardware facilities as the parameters of the network module were further increased. The model training process of the network with the added dynamic adaptive fusion strategy (IDeMe-RD Network) is denoted in Fig. 10. From the figure, both improved networks can converge faster and are more stable. There is no parameter update in the testing phase, so the network will be more adaptive to the training data.

Fig. 10. The model training process of the network with the added dynamic adaptive fusion strategy (IDeMe-RD Network).

The results of the comparison experiments between several commonly used classification models and the further improved IDeMe-RD Network studied are expressed in Table 2. From Table 2, IDeMe-RD Network with dynamic adaptive fusion strategy performs the best on the 5-way 1-shot small-sample image classification task with an accuracy of 60.15% $\pm$ 0.28, which is significantly better than other strategies. This indicates that the dynamic adaptive fusion strategy plays a positive role in improving the classification accuracy. Our IDeMe-R Network and IDeMe Network both adopt ResNet-18 in their structure, and the accuracies reach 59.83% $\pm$ 0.44 and 59.12% $\pm$ 0.86, respectively. This reflects the superiority of ResNet-18 in image classification, and also indicates that the IDeMe-R Network has a better performance compared to the IDeMe Network. Matching Network has the weakest performance with an accuracy of only 43.58% $\pm$ 0.85. The research finding indicated the validity of the study for further improvement of IDeMe-R Network. At the same time, the differences in training error and time consumption among different models were also compared. According to the results, the IDeMe-RD network performed the best in 5-way 1-shot small sample data training with a training error of 0.162, followed by the IDeMe-R network with a training error of 0.172. The training errors for IDeMe network, MAML network, and Relationship network are 0.182, 0.191, and 0.213, respectively. At the same time, comparing the time consumption of different models in actual image classification, IDeMe-RD network and IDeMe-R network showed the best time consumption, with 15.150s and 16.150s respectively. The worst performing network is the Prototype network, with a time consumption of 26.12. It can be seen that the IDeMe RD network with adaptive fusion strategy performs the best in classification accuracy, error performance, and training time.

Table 2. Improvement of image classification algorithm based on dynamic adaptive fusion strategy.

Number	Module	Structure of feature extractor	5-way 1-shot (%)	Training error (%)	Training time (s)
1	Prototypical network	64(3)-64(3)-64(3)-64(3)	49.590$\pm$0.750	0.242	26.120
2	Matching Network	Inception Network	43.580$\pm$0.850	0.221	24.120
3	Relation Network	64(3)-64(3)-64(3)-64(3)	56.980$\pm$0.920	0.213	23.120
4	MAML	32(3)-32(3)-32(3)-32(3)	48.690$\pm$1.800	0.191	20.120
5	IDeMe Network	ResNet-18	59.120$\pm$0.860	0.182	19.240
6	IDeMe-R Network	ResNet-18	59.830$\pm$0.440	0.172	18.150
7	IDeMe-RD Network	ResNet-18	60.150$\pm$0.280	0.162	16.150

Finally, IDeMe Network, IDeMe-R Network, and IDeMe-RD Network were selected for P-R (Precision Recall, P-R) curve analysis. The larger the area of the P-R curve, the better the overall training effect of the model. Fig. 11(a) shows the test results on the 5-way1-shot dataset, with IDeMe Network, IDeMe-R Network, and IDeMe-RD Network selected for experimental testing. From the P-R curve area, the IDeMe-RD network has a larger curve area and better model performance. Compared to IDeMe-R network and IDeMe network, the training performance of IDeMe-RD network has been improved by 8.35% and 8.21% respectively. Fig. 11(b) shows the test results on the 5-way5 shot dataset, which are similar to the 5-way1 shot test results. The IDeMe-RD network has a larger area under the P-R curve and better training performance. Compared with the IDeMe-R network and the IDeMe network, the training performance of the IDeMe-RD network has improved by 7.32% and 9.21%, respectively. It can be seen that the technology proposed by the research institute has outstanding application effects in the field of image classification, meeting the requirements of image data processing.

Fig. 11. Comparison of P-R curve performance under different data.

5. Conclusion

To promote the generalization ability and classification efficiency of the image classification model in FSL, the study conducted an in-depth research and improvement of the small-sample image classification algorithm with image deformation meta-network. It mainly includes the modification of the auxiliary graph selection method and image fusion weights, and also introduces the dynamic adaptive fusion strategy to avoid the target region from being over-segmented and better target feature extraction. The research outcomes denoted that the improved IDeMe Network realized the best performance in both 1-shot and 5-shot classification tasks, with accuracies of 59.17% and 74.66%, respectively. In comparison, the improved deformed meta-network (IDeMe-R Network) shows improved performance by achieving 59.92% and 75.23% accuracy in 1-shot and 5-shot tasks, respectively. Further IDeMe-RD Network with dynamic adaptive fusion strategy achieves an accuracy of 60.15% $\pm$ 0.28 on the 5-way 1-shot small sample image classification task, which is significantly better than the other strategies. The experimental results show that the improvement of small-sample image classification algorithm based on image deformation meta-network is effective and can significantly improve the classification accuracy. In particular, the dynamic adaptive fusion strategy plays a positive role in improving the classification accuracy. However, it is also found that the data-enhanced images are partially difficult to discriminate, and the image fusion method can be subsequently optimized to make the obtained deformed images closer to the real world.

Data Availability Statement

All data generated or analysed during this study are included in this article.

Funding Statement

This research was supported by the Science and Technology Key Project of Henan Province (No. 232102321072), Science and Technology project of Nanyang (No. KJGG036).

References

Guo Y. , Mustafaoglu Z. , Koundal D. , 2023, Spam detection using bidirectional transformers and machine learning classifier algorithms, Journal of Computational and Cognitive Engineering, Vol. 2, No. 1, pp. 5-9

Yang Y. , Song X. , 2022, Research on face intelligent perception technology integrating deep learning under different illumination intensities, Journal of Computational and Cognitive Engineering, Vol. 1, No. 1, pp. 32-36

Arumuga Maria Devi T. , Darwin P. , 2022, Hyper spectral fruit image classification for deep learning approaches and neural network techniques, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 30, No. 3, pp. 357-383

Lu H. , Wang Y. , Li Y. , 2022, Introduction to meta learning for internet of multimedia things, International Journal of Machine Learning and Cybernetics, Vol. 14, No. 2, pp. 361-362

Vázquez C. G. , Breuss A. , Gnarra O. , Portmann J. , Madaffari A. , Da Poian G. , 2022, Label noise and self-learning label correction in cardiac abnormalities classification, Physiological Measurement, Vol. 43, No. 9

Song Y. , Wang T. , Cai P. , Mondal S. K. , Sahoo J. B. , 2023, A comprehensive survey of few-shot learning: evolution, applications, challenges, and opportunities, ACM Computing Surveys, Vol. 55, No. 13S, pp. 1-40

Sun X. , Wang B. , Wang Z. , Li H. , Li H. , Fu K. , 2021, Research progress on few-shot learning for remote sensing image interpretation, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 14, pp. 2387-2402

Zhou X. , Liang W. , Shimizu S. , Ma J. , Jin Q. , 2020, Siamese neural network based few-shot learning for anomaly detection in industrial cyber-physical systems, IEEE Transactions on Industrial Informatics, Vol. 17, No. 8, pp. 5790-5798

Chen C. , Li K. , Wei W. , Zhou J. T. , Zeng Z. , 2021, Hierarchical graph neural networks for few-shot learning, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, No. 1, pp. 240-252

Zhang R. , Yang S. , Zhang Q. , Xu L. , He Y. , Zhang F. , 2022, Graph-based few-shot learning with transformed feature propagation and optimal class allocation, Neurocomputing, Vol. 470, pp. 247-256

Song H. , Deng B. , Pound M. , Özcan E. , Triguero I. , 2022, A fusion spatial attention approach for few-shot learning, Information Fusion, Vol. 81, pp. 187-202

Li W. , Ren T. , Li F. , Zhang J. , Wu Z. , 2021, Contextual similarity-based multi-level second-order attention network for semi-supervised few-shot learning, Neurocomputing, Vol. 461, pp. 336-349

Jiang W. , Huang K. , Geng J. , Deng X. , 2020, Multi-scale metric learning for few-shot learning, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, No. 3, pp. 1091-1102

Zhang J. , Wang Y. , Zhu K. , Zhang Y. , Li Y. , 2021, Diagnosis of interturn short-circuit faults in permanent magnet synchronous motors based on few-shot learning under a federated learning framework, IEEE Transactions on Industrial Informatics, Vol. 17, No. 12, pp. 8495-8504

Zhang A. , Li S. , Cui Y. , Yang W. , Dong R. , Hu J. , 2019, Limited data rolling bearing fault diagnosis with few-shot learning, IEEE Access, Vol. 7, pp. 110895-110904

Das D. , Lee C. S. G. , 2019, A two-stage approach to few-shot learning for image recognition, IEEE Transactions on Image Processing, Vol. 29, pp. 3336-3350

Su Y. , Zhao H. , Lin Y. , 2022, Few-shot learning based on hierarchical classification via multi-granularity relation networks, International Journal of Approximate Reasoning, Vol. 142, pp. 417-429

Zhao J. , You X. , Duan Q. , Liu S. , 2022, Multiple ant colony algorithm combining community relationship network, Arabian Journal for Science and Engineering, Vol. 47, No. 8, pp. 11-16

Lei Li

Lei Li received his B.S. degree from the College of Computer and Information Engineering from Henan Normal University in 2006, and an M.S. degrees from the School of Computer Science and Technology from Huazhong University of Science and Technology in 2009, respectively. His current research interests include artificial intelligence and image processing.

Yuemei Ren

Yuemei Ren received her B.S. degree from the College of Computer and Information Engineering from Henan Normal University in 2006, an M.S. degrees from the School of Computer Science and Communication Engineering from Jiangsu University in 2008, and a Ph.D. degree from the School of Computer Science from Northwestern Polytechnical University. Her current research interests include artificial intelligence and pattern recognition.