Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (School of Design Art, Changsha University of Science and Technology, Changsha 410114, China)
  2. (School of Elementary Education, Hunan First Normal University, Changsha 410110, China)



Plant remote sensing images, Landscape design, ResNet50, Image classification, Maximum, inter class variance, Mask

1. Introduction

In today's rapidly developing technological era, remote sensing technology is important in obtaining surface information and environmental data [1]. Especially in the fields of plant landscape design and environmental management, remote sensing image recognition technology plays a crucial role [2]. After entering the 21st century, with the rapid development of machine learning, remote sensing image recognition technology has undergone revolutionary changes [3]. Especially with the application of deep learning algorithms, the processing ability and accuracy of remote sensing image have been greatly improved [4,5]. Among them, convolutional neural networks show an outstanding performance in image recognition. ResNet50, as an important variant of convolutional neural networks, has shown significant advantages in processing remote sensing image due to its unique ``residual learning'' function. However, although ResNet50 performs well in plant remote sensing image recognition, it still faces some challenges when applied to landscape design. For example, factors such as resolution, lighting conditions, and seasonal changes in remote sensing image may affect recognition accuracy. Therefore, the study proposes a plant remote sensing image recognition model based on an improved ResNet50 network, aiming to improve intelligent recognition accuracy of plant remote sensing image and provide landscape designers with higher quality vegetation data materials.

The structure of this study is: The first part introduces the main technical routes of remote sensing image recognition, as well as the role of plant remote sensing image recognition in landscape design. The main content of the second part is to design an adaptive threshold binary mask algorithm with mixed maximum inter class variance and an image classification algorithm based on an improved ResNet50 network. The research combines the two to form a plant remote sensing image recognition model. The third part conducts experimental verification. Finally, the fourth part summarizes the entire study, analyzes the conclusions, advantages and disadvantages, and significance of this study.

2. Related Works

The core theme of this study belongs to remote sensing image recognition. Recently, remote sensing image recognition has gradually become a research hotspot, and experts have conducted extensive research on this topic. Trotta G et al. proposed a method combining field observations with remote sensing data to study the effects of wildfire intensity on plant communities and alien plant invasions. The vegetation status of 35 plots under different fire intensity was evaluated from satellite images by differential normalized burn loss ratio [6]. Rygalova N V proposed the method of tree trunk growth diameter increment by remote sensing and dendrology, and determined the climatic factors affecting NDVI dynamics and tree growth increment through research, including the limiting effect of summer temperature and the positive influence of precipitation in the preceding winter and the following summer [7]. Aiming at the impact of urban greening on seasonal allergic rhinitis, Guo YD et al. proposed a comprehensive 10-year urban greening analysis method using remote sensing data, obtained the average annual increase of 0.51 in vegetation cover in Tianjin, and discussed its impact on pollen distribution and allergic rhinitis [8]. Gasela M et al. simulated hyperspectral data analysis using the upcoming sensor nSight-2 for the species-level mapping of wetland ecosystems, and the overall study results showed that all the evaluated classifiers could achieve acceptable mapping accuracy [9]. Aiming at the problem of water hyacinth invasion in lake Tanah, Yismaw B et al proposed A method to analyze its potential water hyacinth coverage area by using satellite images and water nutrient levels. Landsat 7 ETM+ and Landsat 8 images were used to convert DN values into TOA reflectance. Spectral indices such as NDWI and NDVI were used for evaluation, and the best overall accuracy and Kappa coefficient were achieved [10].

In summary, ResNet50 model has high recognition accuracy and robustness, which provides an important tool for image processing. Many studies rely primarily on a single data source, which can lead to deficiencies in data integrity and reliability; Failure to take full account of temporal and spatial scale differences when processing data; The selection of classification methods may not fully consider the performance of different algorithms in practical applications, resulting in limited improvement of mapping accuracy. In view of this, this study proposed an improved ResNet50 model to realize plant remote sensing image recognition and landscape environment design. It aims to combine ground observation data with remote sensing data to enhance the model's understanding of vegetation status and its dynamic change. Multi-time scale and spatial scale analysis methods are applied to assess the impact of climate factors on vegetation more comprehensively.

3. Plant Remote Sensing Image Recognition Method Based on Segmentation Mask and Improved ResNet50

Obtaining landscape plant image data through drones and performing intelligent image recognition can quickly provide management personnel with relevant information on the overall vegetation distribution and vegetation type structure of the landscape, which helps them manage landscape vegetation more scientifically and monitor vegetation growth. This study designs an adaptive threshold based plant remote sensing image segmentation and mask algorithm, as well as an improved ResNet50 plant remote sensing image recognition algorithm with mixed Squeeze and Excitation (SE) channel attention. The two are combined to form a plant remote sensing image recognition model for landscape design.

3.1. Plant Remote Sensing Image Segmentation and Mask Algorithm Based on Adaptive Threshold

Firstly, an enhancement processing module is designed for the dataset in the plant remote sensing image recognition model [11,12]. Due to the high resolution and large image size of the original dataset of remote sensing image, direct use can lead to slow model training speed [13-15]. Therefore, it is necessary to perform down sampling on the original image, to reduce the image length and width to $\tau $ times the original in an equal proportion, that is, the original image needs to be sampled every $1/\alpha $ pixels in the row and column directions [16]. Considering the high resolution of the dataset in this study, setting $\tau $ to 0.2 is more appropriate. Due to the presence of a large amount of environmental noise in the background of landscape plant remote sensing image, filtering processing is still required [17,18]. To ensure minimal loss of true information in the denoised image, median filtering is the most appropriate choice. The image also needs to undergo contrast stretching transformation. Due to the similarity in radiation intensity and lower contrast in location images with concentrated features, the recognition difficulty of the algorithm is higher. Because nonlinear stretching is sensitive to parameters and can affect the stability of the recognition model, grayscale stretching in linear stretching is more suitable for processing the dataset of this study. By using the grayscale histogram of the image, the characteristics of grayscale brightness and contrast in the image can be observed. Let the grayscale range of image $f(i,j)$ be $[a$, $b ]$, and the grayscale range of image $g(i,j)$ after linear transformation be $[a'$, $b']$, and $g(i,j)$ be calculated according to Eq. (1).

(1)
$ g(i,j)=\frac{b'-a'}{b-a} \left[f(i,j)-a\right]+a' . $

To be precise, piecewise linear transformation is used here, which has the effect of highlighting the grayscale range of the target of interest and suppressing the non interest range. Let the grayscale interval of the interest target in the initial image $f(i,j)$ be $[a$, $b]$, and the corresponding grayscale interval of the image be $[0$, $M_{f} ]$. Eq. (2) can be used to expand the grayscale range of the target of interest to $[c$, $d]$.

(2)
$ g(x,y)=\left\{\begin{aligned} & c/a\cdot f(x,y),\quad 0\le f(x,y)\le a,\\ & [(d-c)(b-a)][f(x,y)-a]+c,\\ &\hskip 5pc a\le f(x,y)\le b,\\ & [M_{g} -d]/[M_{f} -b] [f(x,y)-b]+d,\\ &\hskip 5pc b\le f(x,y)\le M_{f}. \end{aligned}\right. $

In summary, the principles of image linear stretching transformation and segmented linear stretching transformation are shown in Fig. 1.

Fig. 1. Principle demonstration of linear and piecewise linear stretching transformation.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig1.png

Due to the fact that non ornamental buildings do not belong to the objects of landscape design, it is necessary to construct algorithms to mask them. This study proposes a binary mask algorithm with a mixed maximum inter class variance adaptive threshold. Considering that the binary image mask method has the advantages of simple processing, low complexity, and easy logical operations, and the image to be processed has been denoised, the binary image method was chosen for mask processing in the study. In a binary image, there are only two values: 0 and 1, and there is a set grayscale value $T$. Pixels that are not higher than $T$ will be mapped to 0, and vice versa will be mapped to 1. It can be seen that finding a reasonable segmentation threshold is the key to the mask. Here, we choose the maximum inter class variance method with fast calculation speed and significant segmentation effect to find the grayscale values of binary images. The designed grayscale value calculation process based on the maximum inter class variance method is as follows. Firstly, for image $I(x,y)$, a threshold $T$ is set to distinguish between foreground and background. This threshold divides the image grayscale into two parts, $C_{1} =\{0$, $1$, $2$, ..., $T\}$ and $C_{2} =\{T+1$, $T+2$, ..., $n-1\}$. The proportion of foreground pixels is set to the total image size to $\omega _{0} $, corresponding to an average grayscale value of $\mu _{0} $. The proportion of background pixels to the total image size is $\omega _{1} $, and the corresponding average grayscale value is $\mu _{1} $; The total average grayscale of the image is $\mu $, and the inter class variance is $S$. If the image size is $M\times N$, the number of pixels with grayscale values less than $T$ is $N_{0} $, and those with grayscale values greater than $T$ are $N_{1} $. Therefore, in the first step, the foreground ratio and background ratio can be calculated, as shown in Eq. (3).

(3)
$ \left\{\begin{aligned} & \omega _{0} ={N_{0}/M} \times N, \\ & \omega _{1} ={N_{1}/ M} \times N. \end{aligned}\right. $

The second step is to calculate the total number $N_{all} $ of pixels, as shown in Eq. (4).

(4)
$ N_{all} =M\times N . $

Considering that the sum of background probability and foreground probability is 1, the average grayscale value can be calculated according to Eq. (5).

(5)
$ \mu =\omega _{0} \cdot \mu _{0} +\omega _{1} \cdot \mu _{1}. $

The third step is to calculate the inter class variance, as shown in Eq. (6).

(6)
$ \lambda =\omega _{0} \left(\mu _{0} -\mu \right)^{2} +\omega _{1} \left(\mu _{1} -\mu \right)^{2}. $

The maximum value operation is performed on Eq. (6) and the required binary threshold values are divided for the corresponding $T_{k} $. By using the binary mask based on the maximum inter class variance method, the main framework of the mask area can be extracted. To further optimize the details, it is necessary to perform a series of morphological processing on the segmented image to improve the mask effect. The morphological processing methods used here are corrosion and expansion. From a mathematical perspective, corrosion or dilation is the convolution operation of a complete or partial image (denoted as $A$) with its corresponding computational kernel (denoted as $B$).

In corrosion calculation, assuming $B$ corrodes $A$, the calculation method is shown in Eq. (7).

(7)
$ A\Theta B=\{z \mid (B)_{z} \subseteq A\} . $

If $B$ can be completely contained within $A$ after translation, then the set of $z$ points forms a corrosion of $B$ on $A$, and the intersection of the complement of B and A is empty, as shown in Eq. (8).

(8)
$ A\Theta B=\{z\mid(B)_{z} \cap A^{c} =\varnothing \} . $

Similarly, the expansion of set $A$ to $B$ can be described as $A\oplus B$ using Eq. (9).

(9)
$ A\oplus B=\{z\mid[(\hat{B})_{z} \cap A] \subseteq A\} . $

That is to say, if the mirror image of $B$ intersects with $A$ after translation, the set of $z$ points forms an expansion of $B$ on $A$. But the intersection of the mirror images after $A$ and $B$ translation cannot be empty, that is, Eq. (10) holds.

(10)
$ A\oplus B=\{z\mid[(\hat{B})_{z} \cap A] \ne \varnothing \}. $

In summary, the image morphology processing process after binary segmentation: The first step is to remove micro particles to protect the real target from being masked. The second step is to connect the disconnected parts through corrosion calculation. The third step is to perform dilation operation to cover adjacent areas. The fourth step is to fill the internal holes. Finally, the binary image is de inverted and dot multiplied with the original image to generate a mask region. At this point, the segmentation mask algorithm for plant remote sensing image has been designed, and the overall process is shown in Fig. 2

Fig. 2. Segmentation mask algorithm for plant remote sensing image.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig2.png

3.2. Improved ResNet50 Plant Remote Sensing Image Recognition Model with Mixed SE Channel Attention

After the preprocessing and segmentation mask of plant remote sensing image are completed, they will be input into the recognition model. The ResNet50 neural network algorithm has excellent feature recognition ability, and this algorithm is chosen to design the recognition model. To further enhance the recognition ability of ResNet50 algorithm for key features and its performance on small sample datasets, the algorithm is now being improved.

Due to the diverse types of landscapes and vegetation that need to be recognized in landscape recognition tasks, and some recognition objects have similar shapes, an attention module based on SE channels is added after each Bottleneck block in ResNet50. The SE module performs adaptive re-calibration of the features extracted from the convolutional layer through two steps of ``Squeeze'' and ``Excitation''. In the ``squeeze'' phase, the module captures the importance of each feature channel by averaging the channel dimensions globally to generate a vector that describes the global information. In the ``activation'' phase, the channel importance is remapped using the fully connected layer, so that the network can focus more on those features that contribute to plant recognition, so that key plant features can be effectively extracted when dealing with crops in similar areas. The final designed ResNet50 algorithm structure that integrates SE modules and transfer learning modules is shown in Fig. 3.

Fig. 3. Improved Resnet50 structure incorporating SE channel attention module.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig3.png

The detailed structure of each convolution module in Fig. 3 is shown in Table 1. The first string after ``Conv,'' represents the convolutional layer size, the second value after ``,'' represents the number of neurons in the convolutional module, ``Stride'' represents the step size of the corresponding convolutional module, and ``FC'' represents the fully connected layer.

Table 1. Detailed structure of the improved ResNet50 convolution module incorporating SE channel attention.

Number

Convolutional Layer

Convolutional parameters

Output Size

#1

Conv1

Conv1, 14×14, 128, stride 2

224×224

#2

Conv2

Max pool, 3×3, stride

112×112

[Conv2_1, 1×1, 128

Conv2_2, 3×3, 128

Conv2_3, 1×1, 128] ×3

#3

Conv3

[Conv3_1, 1×1, 256

Conv3_2, 3×3, 256

Conv3_3, 1×1, 1024] ×4

56×56

#4

Conv4

[Conv4_1, 1×1, 512

Conv4_2, 3×3, 512

Conv4_3, 1×1, 2048] ×6

28×28

#5

Conv5

[Conv5_1, 1×1, 1024

Conv5_2, 3×3, 1024

Conv5_3, 1×1, 2048] ×3

14×14

Considering that the training dataset in plant remote sensing image landscape recognition tasks may have drawbacks such as small scale and insufficient involvement of plant and landscape species, transfer learning technology will be integrated into the construction of recognition models. Transfer learning is divided into four types: feature based, sample based, relationship based, and model parameter based transfer. Considering the difficulty of implementation and the type and scale of current academic plant image datasets, model-based transfer learning is chosen here. Specifically, it involves training a plant image classification model using the ImageNet dataset to initialize the convolutional layer parameters of the improved ResNet50 algorithm.

The SE attention module is specifically designed in the model, which can use convolution operations to obtain weights, and then fuse this weight with the Feature Map for data with higher importance. The reason for choosing SE type attention modules is that this type of attention module has low requirements for the size of algorithm parameters and can capture the correlation between different channels, making it more suitable for the ResNet50 algorithm structure in this study. Specifically, the SE module will be integrated into the residual simulation of the ResNet50 algorithm, enabling the algorithm to better learn weight information from different channels in the feature map. The structure of the residual module that integrates SE channel attention is shown in Fig. 4

Fig. 4. The residual module structure that integrates SE channel attention.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig4.png

The loss function of the ResNet50 algorithm is redesigned and improved. In the task of plant remote sensing image recognition, there is a serious species imbalance between different types of plants and landscapes. The CrossEntropyLoss function treats different samples in an equal weight manner, which will result in poor processing performance of the model for difficult to classify samples. Therefore, the Focal Loss function is chosen here to construct the ResNet50 algorithm, and the calculation method of this function $focal\_ loss(\rho _{t} )$ is shown in Eq. (11).

(11)
$ focal\_ loss(\rho _{t} )=-\alpha (1-\rho _{t} )^{\gamma } \log (\rho _{t} ). $

In Eq. (11), $\alpha $ represents the category weight, $\rho _{t} $ represents the algorithm's ability to recognize samples, $\rho _{t} $ represents stronger recognition ability, and $\gamma $ represents the coefficient that controls the weight of samples with different classification difficulties in the loss function. After multiple experiments, setting the $\gamma $ parameter to 2.3 is more appropriate for this study. To prevent overfitting in neural networks, it is necessary to incorporate a random dropout module into the network, which is placed after each SE module. Assuming that there is a probability of $p$ stopping working for each neuron during randomization, and before this step is carried out, for neuron $i$, the neuron output is calculated according to Eq. (12), and the output is activated and converted according to Eq. (13).

(12)
$ z_{i}^{(l+1)} =w_{i}^{(l+1)} y^{l} +b_{i}^{(l+1)} . $

In Eq. (12), $z_{i}^{(l+1)} $ is the total input, $w_{i}^{(l+1)} $ is the neuron weight coefficient, $y^{l} $ is the output connected to the corresponding neuron in layer $l$, and $b_{i}^{(l+1)} $ is the bias coefficient.

(13)
$ y_{i}^{(l+1)} =f\left(z_{i}^{(l+1)} \right) . $

In Eq. (13), $y_{i}^{(l+1)} $ represents the corresponding prediction result o, and $f(\cdot )$ represents the activation function. When random discard operation is performed based on the discard probability $p$, the neuronal output is shown in Eq. (14).

(14)
$ z_{i}^{(l+1)} =w_{i}^{(l+1)} \tilde{y}^{(l)} +b_{i}^{(l+1)} . $

In Eq. (14), $\tilde{y}^{(l)} $ represents the neurons in layer $l$ that have been randomly discarded. Here, the gradient descent method optimizes the network. When calculating, it is necessary to first take the derivative of $focal\_ loss()$ to obtain the gradient to update the model until it completes convergence, as shown in Eq. (15).

(15)
$ \theta _{j+1} =\theta _{j} +\frac{lr\cdot \partial focal\_ loss()}{\partial \theta _{j} } . $

In Eq. (15), $\theta _{j} $ represents the model parameters at the $j$-th iteration, ${\partial focal\_ loss()/ \partial \theta _{j} }$ represents the calculated gradient, and $lr$ represents the network learning rate. In summary, the calculation process of the plant remote sensing image recognition model, which integrates the improved ResNet50 algorithm and binary mask segmentation, is shown in Fig. 5.

Fig. 5. Calculation process of plant remote sensing image recognition model.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig5.png

4. Performance Testing of Plant Remote Sensing Image Recognition Methods

After completing the design of the plant remote sensing image recognition model, a performance test was conducted to verify the model performance. Moreover, it was necessary to select multiple commonly used algorithms for remote sensing image recognition and construct comparative models for comparative analysis.

4.1. Test Plan Design

The dataset used in the test was divided into two parts. The first part was ImageNet, which was used in transfer learning. The second part consisted of plant remote sensing image data obtained through unmanned aerial vehicle (UAV) devices in this study. In this research, seven UAVs were rented, and remote sensing image were collected in various landscape areas in China from 10:00 AM to 4:00 PM on sunny and partly cloudy days with a light breeze. A total of 866 plant remote sensing image were obtained. The test experiment was also divided into two parts. The first part analyzed the quality of the mask processing of the adaptive threshold binary (ATB) algorithm, which was designed for mixed maximum between-class variance. The second part compared the plant recognition performance of various plant remote sensing image models. The evaluation metrics were accuracy, precision, recall, F1 score, coefficient of determination, and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The parameters of the compared models were obtained by running the grid search method within commonly used value ranges. The results of the parameter settings for the designed model are shown below. The sets were divided in a 7:3 ratio.

Table 2. Parameter value scheme for this design model.

Number

Parameter

Values and Rules

Number

Parameter

Values and Rules

*01

Sample size for single batch training

64

*06

Maximum number of iterations

350

*02

Learning rate

0.00018

*07

Optimizer Type

Small batch Gradient descent

*03

Loss function regularization coefficient

2.3

*08

Parameter initialization method

Random initialization

*04

Training mode

Graphics processing unit

*09

Neuron loss rate

0.3

*05

Does the hidden layer have an offset term

Y

/

/

/

4.2. Analysis of Test Results

Firstly, the mask experiment of plant remote sensing image was conducted. Patch Match (PM) algorithm, Criminisi algorithm, and Deep Image Prior (DIP) algorithm were selected to construct comparative mask methods. In this experiment, 46 image processing experts from both domestic and international sources were invited to subjectively evaluate the mask results of the algorithms on a scale of ten points. Higher scores indicated better mask processing. The evaluation results of these methods are shown in Fig. 6. Fig. 6(a) displays the given numbers and expert ratings of common plant categories, ranging from ``*1'' to ``*8,'' representing magnolia, ginkgo, camphor tree, mimosa, cedar, crepe myrtle, rhododendron, and other plants, respectively. Fig. 6(b) describes the overall distribution of expert ratings for each mask algorithm. It can be observed that the ATB mask algorithm designed in this study received higher expert ratings on common plant categories compared to the other three comparative algorithms. In general, the median expert ratings of ATB, PM, Criminisi, and DIP algorithms were 9.23, 8.37, 7.18, and 8.74, respectively. From the perspective of expert score distribution, ATB algorithm's score percentage above 90% accounts for 80% of the interval, while other algorithms are relatively low, which further indicates the algorithm's advantages in accuracy and visual effects.

Fig. 6. Evaluation of mask algorithm processing results for plant remote sensing image.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig6.png

Next, the analysis of various plant recognition models was conducted using Dense Convolutional Network (DCN), Dual Attention Network (DAN), Spatial Pyramid Pooling Network (SPP-Net), and traditional ResNet50 network as comparative models. The performance of these five models during the training phase is shown in Fig. 7. Figs. 7(a) and 7(b) represent the accuracy and precision curves of the plant remote sensing image recognition models during the training phase, respectively. The x-axis of both subfigures represents the iterations, while the y-axis represents the corresponding indicator values. The line styles differentiate between different models. "DIP_IRN" in Fig. 7 represents the recognition model, which combines the binary value mask algorithm and an improved ResNet50 network. It can be observed that the DIP_IRN, DCN, DAN, SPP-Net, and ResNet50 models completed the training after exceeding 82, 201, 126, 208, and 139 iterations, respectively. When iteration reached 300, the Accuracy and Precision of the DIP_IRN model were 97.8% and 98.1%, higher than those of the comparative models.

Fig. 7. Performance of five plant remote sensing image recognition models during training stage.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig7.png

The performance of the five models in recall rate and F1 score on the test set is shown in Fig. 8. The $x$-axis and $y$-axis have the same meanings as in Fig. 6. Figs. 8(a) and 8(b) describe the recall rate and F1 score data, respectively. The DIP_IRN model had a mean recall rate of 97.8%, higher than the 93.6% for DCN and other models, showing its advantage in identifying all valid plant samples. The DIP_IRN model has an F1 average of 97.7%, which is not only higher than all other comparison models, but also further demonstrates its ability to balance accuracy and recall.

Fig. 8. Performance of five plant remote sensing image recognition models on the test set (Unit: %).

../../Resources/ieie/IEIESPC.2025.14.5.631/fig8.png

Furthermore, the ROC curves and corresponding AUC values of the various recognition models are compared in Fig. 9. The x-axis and y-axis in Fig. 9 represent the false positive rate and true positive rate. It can be observed that the AUC values of the ROC curves on the test set for DIP_IRN, DCN, DAN, SPP-Net, and ResNet50 were 73.5%, 69.5%, 68.3%, 67.7%, and 65.1%, respectively.

Fig. 9. ROC curves and AUC comparison of five plant remote sensing image recognition models.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig9.png

Finally, the training time of each recognition model is compared in Fig. 10. The x-axis represents the number of samples tested, with a maximum value of 260 images, and the y-axis represents the corresponding total computation time. The different-colored and styled markers represent the data points of each model, while the dashed lines of the corresponding color represent the polynomial regression curves for all data points of that model. The gray dashed line represents an auxiliary line. It can be observed that due to the large total number of internal parameters, the computational time of the traditional ResNet50 increases rapidly with the growth of samples. However, after adding the SE channel attention module to ResNet50 in this study, the computational efficiency of the algorithm improved. Therefore, for DIP_IRN, the total computation time showed linear growth as test samples increased. When tested with the entire test set, the total computation times for the DIP_IRN, DCN, DAN, SPP-Net, and ResNet50 models were 2618 ms, 2085 ms, 3164 ms, 387 ms, and 3406 ms, respectively.

Fig. 10. Comparison of training and testing time for five plant remote sensing image recognition models.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig10.png

In summary, it can be concluded that the recognition model designed in this study has superior plant remote sensing image recognition capabilities. This model can be used for rapid identification of large-scale plant species and distribution, providing landscape designers with vegetation baseline maps with more comprehensive information. By accurately identifying existing plant species, designers can better understand the ecological environment of specific areas, making more natural design decisions. In addition, this model can also be used to monitor and evaluate vegetation changes after design implementation. For example, by regularly monitoring plant communities through unmanned aerial vehicle remote sensing after implementing specific plant configurations in an area, it is possible to observe the development of vegetation and evaluate whether the design goals have been achieved, which is crucial for sustainable landscape management and maintenance. Finally, using this model can help landscape designers make more scientific vegetation choices during the landscape planning phase. By analyzing remote sensing image, designers can understand which plants perform well in specific environments and which ones are not suitable, thereby guiding plant selection and configuration to achieve ecological and aesthetic harmony. The recognition results of plant image features by different methods are shown in Fig. 11.

Fig. 11. Recognition results of plant image features by different methods.

../../Resources/ieie/IEIESPC.2025.14.5.631/fig11.png

As shown in Fig. 11, SPP-Net performs well in processing images of different sizes, but its feature extraction process is complicated, which is easy to lead to information loss, and it is difficult to capture inter-layer depth features. DCN and DAN are still insufficient in the expression of complex background and detailed features, especially in the recognition of small plant objects. DIP_IRN combines the adaptive binary mask algorithm with the improved ResNet50 structure, and shows high accuracy in feature extraction, which helps to improve the overall recognition performance. In order to verify the advanced nature of the proposed method, UNet++ structure combined with error backpropagation neural network (BPNN-Unet ++) model and higher-order interactive submodule combined with feedforward neural network (FFN-HB) model were used for comparison. The results are shown in Table 3.

Table 3. Verification results of model advancement.

Model

Accuracy

Precision

Recall

F1 Score

AUC

BPNN-

UNet++

0.956

0.945

0.938

0.941

0.88

FFN-HB

0.962

0.95

0.945

0.947

0.89

Improved ResNet50

0.978

0.981

0.978

0.977

0.93

The results in Table 3 show that the improved ResNet50 model outperforms the BPNN-Unet ++ and FFN-HB models in all indexes, showing stronger overall performance, especially in the accuracy and overall accuracy of image classification. The recall rate (97.8%) and F1 value (97.7%) of the improved ResNet50 model are also higher than those of the BPNN-UN ++ and FFN-HB models, which are important indicators for improving the model's performance when dealing with unbalanced data. The AUC value of improved ResNet50 (0.93) was also higher than that of other models, indicating that it was better at distinguishing between positive and negative samples and was able to identify different plant species more effectively. The experimental results prove the superiority of the improved ResNet50 model in plant remote sensing image recognition tasks, showing better performance both in the number of evaluation indicators and in practical application scenarios, which proves the advanced nature and practical value of the proposed method.

5. Conclusion

In this study, a plant remote sensing image recognition model aimed at landscape design was designed, and the test results are as follows. The masking algorithm in the recognition model scored higher in expert evaluations on common plants than the other three comparison algorithms. Overall, the median expert scores for ATB, PM, Criminisi, and DIP algorithms were 9.23, 8.37, 7.18, and 8.74, respectively. When the iteration number was 300, the accuracy and precision of the DIP_IRN model were 97.8% and 98.1%, respectively, both significantly higher than the comparison models. The average recall rate and F1 value of the DIP_IRN model designed in this study on various plant remote sensing image were 97.8% and 97.7%, 4.2 and 4.1 % higher than the second-ranked DCN algorithm in overall values. The ROC curve AUC values of DIP_IRN, DCN, DAN, SPP-Net, and ResNet50 on the test set were 73.5%, 69.5%, 68.3%, 67.7%, and 65.1%, respectively. When tested on the entire test set, the total computation time for the DIP_IRN, DCN, DAN, SPP-Net, and ResNet50 models was 2618ms, 2085ms, 3164ms, 387ms, and 3406ms. The recognition performance of the plant remote sensing image recognition model designed in this study is better than traditional and currently common models. Landscape designers can use this model to quickly obtain vegetation baseline maps and understand the distribution of plant communities at the landscape site, providing support for landscape design. Due to testing limitations, the designed model was not deployed to an application-level product for further testing, which is also an area that needs attention in future research.

Funding

The research is supported by General Project of Teaching Reform of Higher Education in Hunan Province in 2022: Research on Teaching Reform of Chinese Painting Education in Design Major in the Perspective of Traditional Craft Renaissance (No. HNJG-2022-0605); General Project of Degree and Graduate Reform of Higher Education in Hunan Province in 2022: Exploration and Practice of Aesthetic Education for Chinese Painting in Art Design Major from the Perspective of Ideology and Politics (No. 2022JGSZ067); General Project of Social Science Fund of Hunan Province in 2022 : Research on the Folk Belief Space of Traditional Settlements in Meishan Region (No. 22YBA091). General Project of Teaching Reform of Higher Education in Hunan Province in 2020: Research on the Teaching Reform of the Course "Modern Educational Technology" for Public funded Normal University Students under the Background of Education Informatization 2.0 (No. HNJG-2020-1101).

REFERENCES

1 
J. Jing, Q. Ren, J. Zhou, and H. B. Song, ``AutoRSISC: Automatic design of neural architecture for remote sensing image scene classification,'' Pattern Recognition Letters, vol. 140, pp. 186-192, December 2020.DOI
2 
D. Yan and M. Yan, ``Remote sensing landslide target detection method based on improved Faster R-CNN,'' Journal of Applied Remote Sensing, vol. 16, no. 4, 44521, 2022.DOI
3 
Y. Liu, Y. Wei, S. Tao, Q. P. Dai, W. Y. Wang, and M. Q. Wu, ``Object-oriented detection of building shadow in TripleSat-2 remote sensing imagery,'' Journal of Applied Remote Sensing, vol. 14, no. 3, 36508, 2020.DOI
4 
X. Feng, H. Fan, Y. Ming, T. X. Zhu, R. Bi, Z. H. Zhang, and Z. Y. Gao, ``Small object detection in remote sensing images based on super-resolution,'' Pattern Recognition Letters, vol. 153, pp. 107-112, January 2022.DOI
5 
B. Chen, H. Sang, L. Xiang, S. Chen, and L. Yan, ``Image recognition based on multiscale pooling deep convolution neural networks,'' Complexity, vol. 2020, 6180317, 2020.DOI
6 
G. Trotta, L. Cadez, F. Boscutti, M. Vuerich, E. Asquini, and G. Boscarol, ``Interpreting the shifts in forest structure, plant community composition, diversity, and functional identity by using remote sensing-derived wildfire severity,'' Fire Ecology, vol. 20, no. 1, pp. 1-17, 2024.DOI
7 
N. V. Rygalova, T. G. Plutalova, and Y. V. Martynova, ``Assessment of the productivity parameters of plant communities in the steppe zone of Western Siberia obtained using remote sensing and dendrochronological methods,'' Arid Ecosystems, vol. 14, no. 2, pp. 169-176, 2024.DOI
8 
Y. D. Guo, Y. Wang, W. Y. Fan, and G. Li, ``Integrated analysis of remote sensing with meteorological and health data for allergic rhinitis forecasting in Tianjin,'' International Journal of Biometeorology, vol. 68, no. 11, pp. 2307-2319, 2024.DOI
9 
M. Gasela, M. Kganyago, and G. D. Jager, ``Using resampled nSight-2 hyperspectral data and various machine learning classifiers for discriminating wetland plant species in a Ramsar Wetland site, South Africa,'' Applied Geomatics, vol. 16, no. 2, pp. 429-440, 2024.DOI
10 
A. B. Yismaw, W. S. Workie, D. G. Alamirew, and W. A. Ayenew, ``Current trend of water hyacinth expansion and investigation of possible cause for water hyacinth using remote sensing in the case study of Lake Tana, Ethiopia,'' Water, Air, and Soil Pollution, vol. 235, no. 7, pp. 1-16, 2024.DOI
11 
D. Yi, J. Su, and W. H. Chen, ``Probabilistic Faster R-CNN with stochastic region proposing: Towards object detection and recognition in remote sensing imagery,'' Neurocomputing, vol. 459, no. 1, pp. 290-301, 2021.DOI
12 
Q. Tan, B. Guo, J. Hu, X. F. Dong, and J. P. Hu, ``Object-oriented remote sensing image information extraction method based on multi-classifier combination and deep learning algorithm,'' Pattern Recognition Letters, vol. 141, pp. 32-36, January 2020.DOI
13 
A. Wang, L. Xu, Y. Li, J. Y. Xing, X. R. Chen, K. Liu, Y. Liang, and Z. Zhou, ``Random-forest based adjusting method for wind forecast of WRF model,'' Computers & Geosciences, vol. 155, 104842, October 2021.DOI
14 
B. K. Veettil, R. D. Ward, M. D. A. C. Lima, M. Stankovic, N. H. Pham, and X. Q. Ngo, ``Opportunities for seagrass research derived from remote sensing: A review of current methods,'' Ecological Indicators, vol. 117, 106560, October 2020.DOI
15 
C. Hebbi and H. Mamatha, ``Comprehensive dataset building and recognition of isolated handwritten Kannada characters using machine learning models,'' Artificial Intelligence and Applications, vol. 1, no. 3, pp. 179-190, 2023.DOI
16 
W. Xie, J. Lei, S. Fang, Y. S. Li, X. P. Jia, and M. G. Li, ``Dual feature extraction network for hyperspectral image analysis,'' Pattern Recognition, vol. 118, no. 7, 107992, 2021.DOI
17 
K. Mayer, B. Rausch, M. L. Arlt, G. Gust, Z. Wang, D. Neumann, and R. Rajagopal, ``3D-PV-Locator: Large-scale detection of rooftop-mounted photovoltaic systems in 3D,'' Applied Energy, vol. 310, 118469, March 2022.DOI
18 
M. Wei, J. Tang, H. Tang, R. Zhao, X. H. Gai, and R. Y. Lin, ``Adoption of convolutional neural network algorithm combined with augmented reality in building data visualization and intelligent detection,'' Complexity, vol. 2021, 5161111, 2021.DOI

Author

Ying Liu
../../Resources/ieie/IEIESPC.2025.14.5.631/au1.png

Ying Liu graduated from the School of Fine Arts, Hunan Normal University in 2008 with a master's degree in Chinese Painting Creation and Research. She is currently an associate professor and master's supervisor at the School of Design and Art, Changsha University of Science and Technology. She is also a member of the China Artists Association and the China Art Education Research Association. Her research interests include spatial art and art design.

Lin Liu
../../Resources/ieie/IEIESPC.2025.14.5.631/au2.png

Lin Liu graduated from the School of Computer and Information Engineering, Tianjin Normal University in 2006, majoring in Educational Technology, with a master's degree. She is currently a full-time teacher at the College of Primary Education, Hunan First Normal University, and her research interests include information-based education and smart education.