Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (Organization Department, Shijiazhuang College of Applied Technology, Shijiazhuang 050000, China)
  2. (Party Affairs Department, Shijiazhuang College of Applied Technology, Shijiazhuang 050000, China)
  3. (Information Technology Center, Hebei Normal University, Shijiazhuang 050000, China lishaokang@hebtu.edu.cn)



CNN, AlexNet model, TM data, remote sensing images

1. Introduction

In the context of rapid social development and increasing population pressure, the demand for land use is greatly increased and land cover detection and other related issues are currently a popular research area [1]. An important tool for dynamic land cover monitoring is remote sensing technology, as it has the merits of low labour cost, large detection range and high efficiency [2,3,4]. The current theories on land cover classification from remote sensing images (RSIs) are focused on a certain type of problem and cannot be widely applied, so improving the efficiency and accuracy of remote sensing technology is also an important direction for optimization [5]. With the widespread use of neural networks, excellent results have been obtained in many fields, but relatively few applications in land cover problems, as existing models suffer from the mismatch between sample size and input and output requirements, and therefore need to be improved [6] Convolutional Neural Networks (CNN), which is extensively applied in image classification, was then introduced into the land cover classification problem. The AlexNet model was chosen for feature extraction and input to support vector machines (SVM) classifier for validation, but there is still a problem of large difference between the input image and land sample size of the model. Therefore, this research will design the LCNet-27 model and LCNet-13 model based on CNN combined with AlexNet model, which can well solve the large difference between the size of training samples and model design, and analyse the classification results under different input sizes and different resolutions.

2. Related Works

CNN are widely used in image classification, and many scholars have explored CNNs in recent years, applying their optimisation and improvement to numerous classification domains. Ragupathy and Karunakaran developed a method for the brain tumour detection process of meningioma brain tumours based on an augmented and co-adaptive neuro-fuzzy inference system with fuzzy logic and a U-Net CNN classification method. The source brain images are enhanced using fuzzy logic and then the dual-tree complex wavelet transform is applied to the enhanced images at different scales. The decomposed subband images compute the features which were further classified using the CANFIS classification [7]. Zhao et al. proposed a manner to raise the accuracy of textile defect recognition. An integrated learning based CNN method was used on an enhanced TILDA database to classify textile defects. This method can rapidly and efficiently classify textile defect classes and lessen textile production costs [8]. An effective combination of extreme gradient boosting for CNNs was proposed by Raichura et al. Data derived from various test cases were fed to a 1D CNN for advanced feature extraction. Indian power system was taken into account and simulated in PSCAD software, which was programmed with the proposed algorithm, using a multi-run function to collect much data from different anomalies. High performance CPUs for training and testing projection AI techniques [9]. Chu et al. used a malicious code visualisation algorithm to transform the homologous classification into an image classification. A CNN for malicious code images was established. It was trained to complete homology classification of malicious codes. The results prove the classification efficiency of this CNN [10]. Algarni et al. proposed a hybrid CNN-based classification and segmentation method for the detection of neocrown pneumonia from computed tomography CT images by applying a classification phase to the input medical CT images. Pneumonia and neo-coronal pneumonia CT images are differentiated by segmentation stage [11]. For segmenting image data, FIB-SEM is a challenging technique. Skrberg et al. utilized convolutional neural networks to segment FIB-SEM image data. Three sets of data with different porosities were correlated in controlled release drug applications. Good agreement with manual segmentation was demonstrated. In particular, an improvement of this approach is demonstrated compared to the previous use of a random forest classifier trained on a Gaussian model on the same dataset [12]. Shahedi et al. investigated the performance improvement of a deep learning algorithm for 3D image segmentation by incorporating minimal user interaction in a fully convolutional neural network (CNN) When a limited number of sparse boundary landmarks are used on the input image, the performance of the CNN approaches the inter-expert observer variance observed in manual segmentation [13].

For the land cover classification of RSI, researchers have found out many solutions to improve the classification efficiency. Tong et al. put forward a scheme to classify unlabelled HRRS images by applying a depth model obtained from a labelled land cover dataset. The deep CNN is first pre-trained using a well-annotated land cover dataset. Then given an unlabelled target image, the image is classified in a block-by-block fashion using a pre-trained CNN model [14]. Jijon-Palma et al. introduced a hybrid pixel-based model. Including convolution in the encoding and decoding steps allows a feature-based description of the hyperspectral features of the pixels, suitable for performing initial unsupervised classification, and experiments show that the SAE-1DCNN method is more efficient in hyperspectral classification accuracy and computational complexity and can be used as an alternative method for classifying hyperspectral data [15]. Zhang et al. proposed a new semantic segmentation network for LResU networks, in U-net framework with the addition of residual convolution units and recurrent convolution units to classify images with different land cover generated by UAVs at high resolution. The chosen model improves classification accuracy by adding gradient mapping through RCU, modifying the size of the convolutional layers (CLs)through LCU and reducing the convolutional kernel. Experimental results show that the LResU network produces results with higher accuracy than the base algorithm [16]. Liao et al. propose a scheme for sample selection and automatic annotation of samples to create a new labelling benchmark using vector maps. In addition, a suitable multi-scale object-driven CNN was developed for the new benchmark, and the method achieved good performance, validating its feasibility and effectiveness in land cover classification [17]. Kulkarni and Vijaya proposed a combinatorial approach for land cover classification using Dempster-Shafer combination theory, combining the outputs of two classifiers, random forest and SVM. Due to the inherent uncertainty associated with the output of each classifier, land cover maps can be obtained by using machine learning classifiers and experiments showing that accuracy is improved [18]. Cui et al. proposed a new dual-triple attention network (DTAN) to achieve high-precision classification of hyperspectral images based on capturing cross-dimensional interaction information. Specifically, DTAN is divided into two branches to extract the spectral and spatial information of hyperspectral images, which are called the spectral branch and the spatial branch, respectively. The Efficient Channel Attention (ECA) module is introduced into DenseNet, which enables DenseNet to realize partial cross-channel interaction. A series of experiments demonstrated that DTAN has significant advantages over other models when the training samples are very small [19]. To assess the potential of remote sensing to facilitate tracer studies in more turbid rivers, Carl et al. injected rhodamine WT dye into the Missouri River and collected in situ spectra with a boat, video with a small unmanned aircraft system, and orthophotos with an airplane. Application of an optimal band ratio analysis algorithm to the field spectra revealed a strong correlation between the number of spectra based on the spectra and the field concentration measurements [20].

As previously shown, CNNs are widely used in image classification and the field of RSI land cover is also actively searching for better classification schemes, and it is innovative to apply optimization of the model based on CNNs for land cover classification.

3. CNN-Based Land Cover Classification Model Research

3.1 AlexNet Model and Its Generalized Finetune Processing Model

The AlexNet model, from the 2012 ILSVRC 2012 competition and proposed by Alex, is built on a deep CNN and is known for its ultra-low error rate for image classification, with an error rate of only 16.4% for image classification [21]. This model runs through a process of first inputting a $227 \times 227$ sized image, going through a CL, a mechanistic operation, a pooling layer and normalisation and warming, and finally a droupout to prevent overfitting, through two consecutive fully-connected layers, and inputting a SoftMax classifier for classification. The overfitting problem present in the model training is mitigated by Droupout in this model, while the data is enhanced, due to the use of methods such as level flipping in the model. The ReLU function in the model acts as a non-excitation function on the neurons, making the training of the model more efficient [22]. The ReLU function requires fewer operations in the forward calculation and back propagation to find the bias, omitting steps such as division, so ReLU, which is essentially a segmentation function, is less computationally intensive and can handle the computation more efficiently. The structure of the AlexNet is presented in Fig. 1.

Fig. 1. AlexNet model structure diagram.

../../Resources/ieie/IEIESPC.2025.14.1.57/image1.png

As shown in Fig. 1, the AlexNet includes five CLs and two fully connected layers within the model, where the standard pixel value of the input graphics is $227 \times 227$, but the processed image size is compressed to $6\times6$ pixels, and the land covers fewer classification categories with fewer samples. Back propagation is based on the CNN result error, using gradient descent to correct the neural network weights. the CNN uses the SGD algorithm to update the connections to calculate the weights, where the main formula is shown in Equation (1).

(1)
$ \left\{\begin{aligned} \delta _{j}^{l} =\beta _{j}^{l+1} (f(u_{j}^{l} )\bullet up(\delta _{j}^{l+1} )),\\ \frac{\partial L}{\partial \omega _{j}^{l} } =\sum _{u,v}(\delta _{j}^{l} )_{uv},\\ \frac{\partial L}{\partial \omega _{ij}^{l} } =\sum _{u,v}(\delta _{j}^{l} )_{uv} P_{i}^{l-1} {}_{uv},\\ \triangle \omega _{t+1}^{l} =\mu \omega _{t}^{l} -\eta \frac{\partial L}{\partial \omega _{ij}^{l} } =\omega _{t+1}^{l} -\omega _{t}^{l}. \end{aligned}\right. $

In Equation (1), $\delta _{j}^{l} $, $\delta _{j}^{l+1} $ are the sensitivity of the error of the neuron in the $j$ feature layer in the $l$, layers to the basis. $f$ is the derivative of the activation function. $u_{j}^{l} $ is the output value of the activation function of the neuron in the $j$ feature layer in the $l$ layer. $up$ is the upsampling function. $L$ is the description of the sample and label error loss function, and $P_{I}^{l-1} $ is the area neuron multiplied by $\omega _{ij}^{l} $ in the convolution operation $x_{i}^{l-1} $. $\omega _{t+1}^{l} $ and $\omega _{t}^{l} $ are the updated value of the weights of the $l$ layer at the $t+1$ and $t$ iterations respectively. $\eta $ is the learning efficiency, and $\mu $ is the weighted impulse. The basic structure of the CNN includes a CL, a ReLU unit, a downsampling layer, a normalization, a Droup strategy, a SoftMax classifier, and therefore an AlexNet model. The output of the channels in the CL is calculated as shown in Equation (2).

(2)
$ x_{j}^{i} =f(u_{j}^{l} ) . $

In Equation (2), $x_{j}^{i} $ is the output of the $j$ channel of the $l$ CL. It is the result of convolving the $j$ feature map of the $l$ CL with the bias of the $x_{i}^{l-1} $ neglected feature map of the previous layer, and $f$ is the activation function. The formula for $u^{l} $ is shown in Equation (3).

(3)
$ u^{l} =\sum _{i\in Mj}x_{i}^{l-1} *k_{ij}^{l} +b_{j}^{l} . $

In Equation (3), $M_{j} $ is the combination of the upper layer of feature maps calculated from $u_{j}^{l} $. $k_{ij}^{l} $ is the convolution kernel, and $b_{j}^{l} $ is the intercept of the feature map after the operation. The several-return function model of RrLU is shown in Equation (4).

(4)
$ f(x)=\max (0,z) . $

In Equation (4), $z$ is the result of the convolution calculation in the upper feature map layer. The downsampling layer takes the input feature map and further reduces the model parameters to output the downsampled feature map as shown in Equation (5).

(5)
$ x_{j}^{l} =f(u_{j}^{l} ) . $

The formula $u_{j}^{l} $ for calculating this is shown in Equation (6).

(6)
$ u_{j}^{l} =\beta _{j}^{l} down(x_{j}^{l-1} )+b_{j}^{l} . $

In Equation (6), $\beta $ is the coefficient of the downsampling operation. $u_{j}^{l} $ is the eigenmap of the $j$ channel of the downsampling layer $l$.$x_{j}^{l-1} $ is the downsampling operation biased by the downsampling operation. $b_{j}^{l} $ is the bias phase of the downsampling layer. $down$ is the downsampling function, and $x_{j}^{l-1} $ is the mean or maximum value of the eigenmap. The equation for the corresponding local normalisation operation is shown in Equation (7).

(7)
$ b_{x,y}^{i} =\partial _{x,j}^{i} /(\alpha \sum _{j=\max (0,i-n/2)}^{\min (N-1,i+n)}(\alpha _{x,y}^{j} )^{2} ) . $

In Equation (7), $n$ is the $n$ word stack around the first $i$ at a certain position. $\alpha $, $k$, $\beta $ are the hyperparameters. $a_{x,y}^{j} $ indicates the result of the convolution operation and activates the function operation, and $N$ is the total number of convolutions. The formula for the SoftMax function is shown in Equation (8).

(8)
$ \left\{\begin{aligned} \sigma (z)_{j} =\frac{e^{zj} }{\sum _{j=1}^{k}e^{zj} },\\ \sum _{j=1}^{k}\sigma (z)_{j} =1. \end{aligned}\right. $

In Equation (8), $K$ is the amount of neurons in the output layer and $z_{j} $ is the output value of the $i$ category predicted by the model. The AlexNet model is not feasible for remote sensing land cover image classification, so this study innovatively combines the structure of AlexNet model with the land cover classification problem, and finetue processing on the basis of AlexNet model makes the model more widely applicable to remote sensing land cover image classification. Specifically, the finetue process is used to initialise the model with fully trained model parameters. The merit of this method is that training on existing model parameters reduces the workload, and a small number of adjustments are made to existing samples selected until they fit the data. The primary problem in the remote sensing land cover image classification is the initial processing of the image, followed by feature extraction, the classification of the algorithm, and later the processing and accuracy evaluation. The remote sensing image classification is shown in Fig. 2.

Fig. 2. Remote sensing image classification flowchart.

../../Resources/ieie/IEIESPC.2025.14.1.57/image2.png

As shown in Fig. 2, remote sensing image classification first requires pre-processing of the image, and the learning efficiency of the algorithm is closely connected to the level of the input data [23]. The processing is mainly radiation correction, geometric correction, image enhancement, noise removal to reduce the impact on post-processing; secondly, the selection and extraction of features, the purpose is to improve the accuracy of image classification, good features only need a simple algorithm to achieve good results, where the features are divided into two kinds of spectral and spatial texture. Spectral features are visualized as the spectral value of the feature, grayscale value or the ratio between bands, etc.; spatial features are the main features for human recognition [24]. Features are the data that intuitively determine how good a classification is, and the key to accurate image classification is better features. The common classification algorithms used for supervised classification are maximum likelihood, SVMs and artificial neural networks (ANNs). The maximum likelihood method achieves a relatively proud accuracy, but does not perform well in terms of computational simplicity, is not efficient enough to process and need much data to compute. The maximum likelihood method builds a discriminant function based on Bayesian statistical methods, and the posterior probability of occurrence of$x$ is calculated as shown in Equation (9).

(9)
$ P(y_{i} /x)=\frac{p(x/y_{i} )p(y_{i} )}{p(x)} =\frac{p(x/y_{i} )p(y_{i} )}{\sum _{i=1}^{S}p(x/y_{i} )p(y_{i} ) }. $

As shown in Equation (9), $S$ is the sum of categories. $y_{i} $ means the prior probability of the $i$ th category. $p(x/y_{i} )$ is the conditional probability density function of the $i$ th category of $x$, and when $x$ satisfies Equation (10), then the $x$ category is $y_{i} $.

(10)
$ p(x/y_{i} )=\max p(x/y_{i} )p(y_{i} ). $

SVMs require not only correct classification but also a high degree of confidence in the classification results. SVM classification transforms the input data from low latitude space to high latitude space using a kernel function, and seeks the optimal solution to the classification hyperplane in the high latitude space whichever maximises the data interval. Firstly, samples with known classification are selected for training, and the quadratic optimisation formula is obtained by Lagrangian pairwise transformation, as shown in Equation (11).

(11)
$ W(a)=\sum _{j=1}^{l}a_{i} =1/2 \sum _{i=1}^{l}\sum _{j=1}^{l}y_{i} y_{j} a_{i} a_{j} k(x_{i} ,x_{j} ). $

$l$ in Equation (11) is the total number of categories. $k$ is the kernel function. $a$ is the Lagrangian factor, and the common representation of the kernel function is shown in Equation (12)

(12)
$ \left\{\begin{aligned} k(x,y)=(s(xy)+c)^{d},\\ k(x,y)=\exp \left(\frac{-\left\| x-y\right\| ^{2} }{2\sigma ^{2} } \right). \end{aligned}\right. $

The corresponding samples obtained according to the solution of the quadratic programming problem are support vectors, and the classification function is denoted in Equation (13).

(13)
$ f(x)=sgn\left[\sum _{i=1}^{l}y_{i} a_{i} k(x_{i} x) \right] . $

In Equation (13), $sgn$ is the sign function for interpreting the conquered samples and $b$ is the intercept term. The category of the sample is obtained by substituting the unknown sample $X$ obtained from the unclassified image into the classification function equation. The merit of the ANN model is the powerful ability of the special fit. The ANN model can be trained in the sample data of known categories by back propagation gradient descent algorithm, followed by the judgement of unclassified data. The statistics commonly analysed to assess the results accuracy after completing the classification, including user accuracy, overall classification accuracy, and Kappa coefficient. The last one is a measure of whether the classification result is consistent with the standard image, and is calculated as shown in Equation (14).

(14)
$ Kappa=\frac{N\sum _{i=1}^{n}x_{ii} -\sum _{i=1}^{n}(x_{i} +x_{{\rm i}+} ) }{N^{2} -\sum _{i=1}^{n}(x_{i} +x_{i+} ) } . $

In Equation (14), $x_{ii} $ indicates the number of pixels in the classification result where the $i$ category is the same as the $i$ category of the reference image. $x_{i+} =\sum _{i=1}^{n}x_{ij} $ is the amount of pixels in the result where the $i$ category is the same. $n$ is the amount of categories, and $N$ is the total number of all samples. The flow of the finetneurized AlexNet CNN model, acting on the land cover classification study, is shown in Fig. 3.

Fig. 3. AlexNet model finetune land cover classification flow chart.

../../Resources/ieie/IEIESPC.2025.14.1.57/image3.png

3.2 Construction of the Land Cover Classification Model Based on LCNet-27 and LCNet-13

The LCNet CNN contains LCNet-27 and LCNet-13. The overall process of land cover classification research using the LCNet CNN model is shown in Fig. 4, which includes five stages: sample data preparation, model training, optimal sample size selection, comparison of LCNet-27 and LCNet-13, and comparison with traditional methods.

Fig. 4. LCNet model finetune land cover classification flow chart.

../../Resources/ieie/IEIESPC.2025.14.1.57/image4.png

The sample data preparation phase uses different pixel sizes for sample data with different resolutions to normalise them to a standard input size. The model training phase is to add labels to the acquired sample data to form the training data. In the best sample size selection phase, the best trained model with high accuracy is selected as the best model for raising others; the classification results obtained from the training class models of different sizes are evaluated to determine the most suitable sample size for the model; the best size input is obtained in the LCNet-27 and LCNet-13 comparison phase, and the trained model is trained to obtain the best model Classification experiments are performed and the obtained classification results accuracy is evaluated. In the traditional method comparison phase, the best sample size and the best model are used for training, and the obtained model is contrast to the traditional method. A model plot for LCNet-27 is shown in Fig.~5.

Fig. 5. LCNet-27 model structure diagram.

../../Resources/ieie/IEIESPC.2025.14.1.57/image5.png

As shown in Fig. 5, the LCNet-27 model, a CNN, contains a CL, a pooling layer, and a fully connected layer. The number of layers is 3, 3 and 2 respectively. The LCNet-27 model is able to reduce the training size from $27 \times 27$ to $6\times6$ from the CLs to the fully-connected and Softmax layers. A model diagram for LCNet-13 is shown in Fig. 6.

Fig. 6. LCNet-13 model structure diagram.

../../Resources/ieie/IEIESPC.2025.14.1.57/image6.png

As shown in Fig. 6, the LCNet-13 model, a CNN, contains CLs and fully connected layers, all of which are 2 in number. The LCNet-13 model is capable of feeding sample data with a size of $13 \times 13$. In summary, in order to find a class of algorithms that can minimize or maximize the parameter values of the objective function, the objective function optimization tool used in the study is the LCNet model, which performs end-to-end joint optimization by optimizing the network parameters under the conditions of bit rate constraints, signal distortion constraints and semantic misalignment constraints. Different designs of its LCNet make it computationally efficient while maintaining high classification accuracy, which is suitable for the practical application scenario of land cover classification explored in the study.

4. The Finetune Effect of the AlexNet Model and the Application Results of the Remote Sensing Land Cover Classification

4.1 AlexNet Model for Finetune and the Effect of Multiple Factors on Classification Results

AlexNet was finetune to make it universal and more suitable for the classification of RSI. The AlexNet TM image training is shown in Fig. 7.

Fig. 7. AlexNet model finetune training accuracy improvement process.

../../Resources/ieie/IEIESPC.2025.14.1.57/image7.png

Fig. 8. TM image classification results of various sizes and sizes.

../../Resources/ieie/IEIESPC.2025.14.1.57/image8.png

Fig. 7 shows that the TM data reached 90% training accuracy at around 4000 times, but the AlexNet model training process was slow due to the inclusion of five CLs and two fully connected layers in the model; when using AlexNet model for finetune, the sample data was normalised to a standard size for model training. In this thesis, sample sizes of $5\times5$, $7\times7$ and $9\times9$ neighbourhood sizes were chosen for acquisition; TM values were trained at $5\times5$ pixel size for model input, and an accuracy of 94.34% was achieved. The trained model pairs were applied to the experimental area to carry out accuracy comparisons and the classification process was collected in $5\times5$, $7\times7$ and $9\times9$ pixel sizes as input values, and the classification results were evaluated for accuracy. The final TM image classification results are shown in Fig. 8.

There is a clear filtering effect in the share results in Fig. 8(b), the highest training accuracy model is when the sample size is $5\times5$ size and the results in the actual classification show that the $5\times5$ size sample has the best classification ability and the highest accuracy. The filtering effect of image classification increases as the sample size increases, with the filtering effect being most pronounced in the $9\times9$ size sample. CNNs fuse image features with learned representations, showing that the use of too much data can obscure the core pixels of the network, while a large amount of redundant data can also negatively affect the network. It is clear that in the resulting images, the $5\times5$ size samples yielded better classification results than the other sample sizes. For the TM images in this article, the results were compared using two classifiers with a standard input size of $5\times5$: an SVM classifier for spectral features and another for spectral texture features. Thus the classification results obtained for the TM images respectively are shown in Fig. 9.

Fig. 9. TM classification results of different classification methods.

../../Resources/ieie/IEIESPC.2025.14.1.57/image9.png

As shown in Fig. 9, when using spectra and spectra combined with texture features for classification, many small map patches such as those in the box in Fig. 8(b) are found and need to be removed at the end of classification, clearly misclassifying the bare ground, identified as shadows and grasses, and with the introduction of texture features for assistance, the phenomenon of patches due to classification errors is reduced and the classification results using the AlexNet model finetune shows less patchiness and no post-classification processing step is required after the classification is completed. A comparison of the classification results accuracy for different sizes and different classification methods is shown in Fig. 10.

Fig. 10. Overall classification accuracy of TM classification results with different sample sizes and methods.

../../Resources/ieie/IEIESPC.2025.14.1.57/image10.png

Fig. 10(a) expresses the accuracy of the classification results comparison at different sizes and classification methods. Fig. 10(b) shows the comparison of the classification results accuracy, and the results show that the accuracy of the spectral feature method combining texture features is higher than that of the method using spectral features alone. And the accuracy is higher than the Kapp coefficient in all the methods used.

4.2 Results of LCNet-27 and LCNet-13 Model Training and Remote Sensing Land Cover Classification Applications

Compared to the AlexNet model, the LCNet network has a reduction in different layers and different input image sizes, and therefore the learning speed is significantly faster. The training process for both models in TM and QuickBird images is shown in Fig. 11.

Fig. 11. LCNet training process

../../Resources/ieie/IEIESPC.2025.14.1.57/image11.png

As shown in Fig. 11, the image training process for TM is shown in Figs. 11(a), 11(b) and that for QuickBird is shown in Figs. 11(c) and 11(d). Both the TM data and QuickBird data quickly reached an accuracy of nearly 90% at around 1000 training iterations, with the LCNet-27 model taking 20 minutes for 1000 iterations and the LCNet-13 model taking 13 minutes for 1000 iterations, which is much faster compared to the AlexNet model finetune. The TM data achieved the best accuracy of 97.76% for the LCNet-27 model with $5\times5$ pixel size model inputs and 95.33% for the LCNet-13 model with $5\times5$ pixel size model inputs, while the QuickBird data achieved the best accuracy of 97.76% for the LCNet-27 model with $7\times7$ pixel size model inputs and 95.33% for the LCNet-13 model with $5\times5$ pixel size model inputs. The model input for the $7\times7$ pixel size in the LCNet-27 model reached 98.13% and the model input for the $7\times7$ pixel size in the LCNet-13 model reached 96.04%. The accuracy of LCNet-13 is lower than that of LCNet-27 which was selected for the best sample size selection, and the accuracy of the two models was compared and analysed; each pixel in the TM study area was collected according to $3\times3$, $5\times5$, $7\times7$, and $9\times9$ neighbourhood sizes, and they were input into the LCNet-27 model trained by the respective sample sizes. For the QuickBird data test area, each pixel was collected according to $5\times5$, $7\times7$ and $9\times9$ neighbourhood sizes, and the pixel category judgement was carried out in this way; the accuracy comparison between the TM image and QuickBird image classification results is shown in Fig. 12.

Fig. 12. TM images and QuickBird affect the accuracy of classification results.

../../Resources/ieie/IEIESPC.2025.14.1.57/image12.png

From Fig. 12, the pixel value with the greatest classification accuracy is $5\times5$. The classification accuracy of the image decreases with increasing size, and at $3\times3$ the image has less neighbourhood information, i.e., the CNN has difficulty in extracting features that integrate neighbourhood information, and the image is poorly classified. The classification results from $9\times9$ show that the images have a strong filtering effect. TM medium resolution image data has the highest overall classification accuracy when the sample data is $5\times5$ pixels in size. $9\times9$ classification result graph shows that the images have a strong filtering effect, and the image classification categories show reduced image detail information and smooth category edges. The image detail information in the $5\times5$ neighbourhood size sample performs better compared to the $7\times7$ and $9\times9$ neighbourhood size samples; in the QuickBird high-resolution image classification results the sample data $7\times7$ size has the highest overall classification accuracy, and in the $9\times9$ classification result graph it can be seen that the image starts to have a certain filtering effect; due to the better visual spatial characteristics of the QuickBird high-resolution image phase The filtering effect is reduced and the classification details are better preserved compared to the medium resolution TM images; the classification accuracy of QuickBird images does not differ much from the classification results by size, and the improvement in accuracy is greater. The effect of different classification methods on the classification accuracy is shown in Fig. 13.

Fig. 13. Overall classification accuracy and Kappa coefficient of different methods for TM images and QuickBird images

../../Resources/ieie/IEIESPC.2025.14.1.57/image13.png

In Fig. 12, the classification accuracy and kappa coefficients of LCNet-27 are better than those of the other two models. The classification accuracy with texture features added to spatial features was higher than that of the model using only spatial features.

5. Conclusion

The human demand for land use is increasing, and the problem of land cover classification has become a hot spot for research, and remote sensing technology has become a common tool for human exploration of land cover due to its large scope and high efficiency. The powerful feature learning ability of AlexNet model of CNN is gradually applied in land classification, and AlexNet model finetune makes it possible to be more generalized for application The AlexNet model finetune enables more generalised applications. Faced with the contradictory problem of input size and sample size of AlexNet model, LCNet-27 model and LCNet-13 model are proposed on the basis of this model for optimization. The best accuracy of the TM data was 97.76% for the LCNet-27 model with $5\times5$ pixel size and 95.33% for the LCNet-13 model with $5\times5$ pixel size. The QuickBird data achieved 98.13% for the $7\times7$ pixel size model input in the LCNet-27 model and 96.04% for the $7\times7$ pixel size model input in the LCNet-13 model. The accuracy of LCNet-13 is lower than that of LCNet-27. It shows that its optimization strategy is more effective. These optimization models solve the limitations of AlexNet model in application, improve the classification accuracy and speed, and provide a more practical tool for land cover classification. These research results are of great significance for the practical application of remote sensing technology in land use and cover change monitoring. However, there are still shortcomings in this study, for the filtering effects that appear in the results, which are not investigated in depth and the solutions are explored, and they should be studied in depth in future research.

REFERENCES

1 
A. Bhasin, P. Dolker, and P. Raina, ``Land use and land cover change detection using remote sensing in the trans Himalayan region of Ladakh, India,'' ECS Transactions, vol. 107, no. 1, pp. 2985-2997, 2022.DOI
2 
Q. Liu, G. Zhai, and X. Lu, ``Integrated land-sea surveying and mapping of intertidal zone based on high-definition remote sensing images and GIS technology,'' Microprocessors and Microsystems, vol. 82, no. 4, 103937, 2021.DOI
3 
Y. Yang and X. Song, ``Research on face intelligent perception technology integrating deep learning under different illumination intensities,'' Journal of Computational and Cognitive Engineering, vol. 1, no. 1, pp. 32-36, 2022.DOI
4 
G. Chen and Z. Chen, ``Regional classification of urban land use based on fuzzy rough set in remote sensing images,'' Journal of Intelligent and Fuzzy Systems, vol. 38, no. 4, pp. 3803-3812, 2020.DOI
5 
B. Ekim and E. Sertel, ``Deep neural network ensembles for remote sensing land cover and land use classification,'' International Journal of Digital Earth, vol. 14, no. 12, pp. 1868-1881, 2021.DOI
6 
X. Liu, C. He, Q. Zhang, and M. Liao, ``Statistical convolutional neural network for land-cover classification from SAR images,'' IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 9, pp. 1548-1552, 2020.DOI
7 
B. Ragupathy and M. Karunakaran, ``A fuzzy logic-based meningioma tumor detection in magnetic resonance brain images using CANFIS and U-Net CNN Classification,'' International Journal of Imaging Systems and Technology, vol. 31, no. 1, pp. 379-390, 2021.DOI
8 
X. Zhao, M. Zhang, and J. Zhang, ``Ensemble learning-Bbased CNN for textile fabric defects classification,'' International Journal of Clothing Science and Technology, vol. 33, no. 4, pp. 664-678, 2021.DOI
9 
M. Raichura, N. Chothani, and D. Patel, ``Efficient CNN-XGBoost technique for classification of power transformer internal faults against various abnormal conditions,'' IET Generation, Transmission and Distribution, vol. 15, no. 5, pp. 972-985, 2021.DOI
10 
Q. Chu, G. Liu, and X. Zhu, ``Visualization feature and CNN based homology classification of malicious code,'' Chinese Journal of Electronics, vol. 29, no. 1, pp. 154 -160, 2020.DOI
11 
A. D. Algarni, W. El-Shafai, G. Banby, F. E. EL-Samie, and N. F. Soliman, ``An efficient CNN-based hybrid classification and segmentation approach for COVID-19 detection,'' Computers, Materials, and Continuum, vol. 3, pp. 4393-4410, 2022.DOI
12 
S. Fredrik, F. Cecilia, M. Francisco, J. Mats, O. Eva, L. Niklas and R. Magnus, ``Convolutional neural networks for segmentation of FIB-SEM nanotomography data from porous polymer films for controlled drug release,'' Journal of Microscopy, vol. 283, no. 1, pp. 51-63, 2021.DOI
13 
M. Shahedi, J. D. Dormer, M. Halicek, and B. Fei, ``Technical note: The effect of image annotation with minimal manual interaction for semiautomatic prostate segmentation in CT images using fully convolutional neural networks,'' Medical Physics, vol. 49, no. 2, pp. 1153-1160, 2022.DOI
14 
X. Y. Tong, G. S. Xia, Q. Lu, H. Shen, and L. Zhang, ``Land-cover classification with high-resolution remote sensing images using transferable deep models,'' Sensing of Environment, vol. 237, no. 2, 111322, 2020.DOI
15 
M. E. Jijon-Palma, J. Kern, C. Amisse, and J. A. Silva Centeno, ``Improving stacked-autoencoders with 1D convolutional-nets for hyperspectral image land-cover classification,'' Journal of Applied Remote Sensing, vol. 2, 15, 2021.DOI
16 
C. Zhang, L. Zhang, B. Zhang, J. Sun, S. Dong, X. Wang, Y. Li, W. Xu, W. Chu, and Y. Dong, ``Land cover classification in a mixed forest-grassland ecosystem using LResU-Net and UAV imagery,'' Forestry Research, vol. 33, no. 3, pp. 923-936, 2022.DOI
17 
J. Liao, J. Cao, K. Wang, and X. Zhen, ``Land cover classification from very high spatial resolution images via multiscale object-driven CNNs and automatic,'' Journal of Applied Remote Sensing, vol. 16, no. 1, pp. 13-42, 2022.DOI
18 
K. Kulkarni and P. A. Vijaya, ``Using combination technique for land cover classification of optical multispectral images,'' International Journal of Applied Geospatial Research, vol. 12, no. 4, pp. 22-39, 2021.DOI
19 
Y. Cui, Z. Yu, J. Han, S. Gao, and L. Wang, ``Pyramidal and conditional convolution attention network for hyperspectral image classification using limited training samples,'' International Journal of Remote Sensing, vol. 43, no. 8, pp. 2885-2914, 2022.DOI
20 
C. J. Legleiter, B. J. Sansom, and R. B. Jacobson, ``Remote sensing of visible dye concentrations during a tracer experiment on a large, turbid river,'' Water Resources Research, vol. 58, no. 4, pp. 61-83, 2022.DOI
21 
S. Sadhana and R. Mallika, ``An intelligent technique for detection of diabetic retinopathy using improved Alexnet model based convoluitonal neural network,'' Journal of Intelligent and Fuzzy Systems: Applications in Engineering and Technology, vol. 40, no. 4, pp. 7623-7634, 2021.DOI
22 
W. L. Chin, Q. Zhang, and T. Jiang, ``Low-complexity neuron for fixed-point artificial neural networks with ReLU activation function in energy-constrained wireless applications,'' IET Communications, vol. 15, no. 7, pp. 917-923, 2021.DOI
23 
J. Wang, J. Zhou, and W. Huang, ``Attend in bands: Hyperspectral band weighting and selection for image classification,'' IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 12, pp. 4712-1727, 2020.DOI
24 
H. Chen, Y. Qiu, D. Yin, J. Chen, X. Chen, S. Liu, and L. Liu, ``Stacked spectral feature space patch: An advanced spectral representation for precise crop classification based on convolutional neural network,'' Acta Agronomica Sinica, vol. 10, no. 5, pp. 1460-1469, 2022.DOI
Shan Tong
../../Resources/ieie/IEIESPC.2025.14.1.57/author1.png

Shan Tong received her master's degree in public administration from Central China Normal University (2016). She is currently working at the Organization Department of Shijiazhuang College of Applied Technology. She has participated in a number of research projects, including the analysis and solution of employment and social security issues for land-expropriated farmers in Shijiazhuang. Her areas of interest include land resource management and geographic information science.

Yuting Zhang
../../Resources/ieie/IEIESPC.2025.14.1.57/author2.png

Yuting Zhang is Graduated from Hebei University of Economics and Business with a master's degree in journalism. Now works in Shijiazhuang College of Applied Technology. Her research interests are journalism and higher vocational education.

Shaokang Li
../../Resources/ieie/IEIESPC.2025.14.1.57/author3.png

Shaokang Li received his master's degree in science from Hebei Normal University (2016). He is currently working at the Information Technology Center of Hebei Normal University. He has actively participated in various research projects, including those related to the development model and evolution law of declining rural areas in Nihewan Basin. His academic interests encompass statistics, computer science, and geographic information science.