ZhaoRenbi1*
               
                  - 
                           
                        ( School of Management, Guangdong Nanfang Institute of Technology, Jiangmen, 529000,
                        China RenbiZhao@outlook.com)
                        
 
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Keywords
               
                Deep residual shrinkage network,  Travel,  Image retrieval,  Attention mechanism
             
            
          
         
            
                  1. Introduction	
               With the rapid development of the Internet, people have generated a large amount of
                  image data during the tourism process. Using these image data for tourism image retrieval
                  can provide users with accurate and efficient information retrieval services, helping
                  them better choose tourism destinations, plan tourism routes, etc. However, due to
                  the particularity of image data, tourism image retrieval faces some challenges, such
                  as high-dimensional features of images, richness of image semantics, and similarity
                  measurement between images [1,2]. To address these issues, research is based on the deep residual shrinkage network
                  (DRSN), which is improved and then applied to tourism image retrieval. DRSN is an
                  image feature extraction network based on deep learning, which has good feature learning
                  and generalization capabilities [3,4]. The study first utilizes deactivation mechanisms and activation functions to improve
                  the DRSN network, enabling it to extract more image features. The study introduces
                  a global average pooling layer and a novel activation function in DRSN networks to
                  enhance the network's learning ability and generalization performance. Then, the attention
                  mechanism will be incorporated into the improved DRSN network. The attention mechanism
                  can enable the network to automatically learn the weights of different image features
                  and focus more on the features related to the query image [5,6]. By improving the DRSN network, the research aims to enhance the representation and
                  retrieval performance of tourism image features. This study aims to provide a new
                  and effective solution for tourism image retrieval to meet the needs of practical
                  applications. The innovation of the research lies in the introduction of attention
                  mechanism to enhance the ability of DRSN to capture semantic information of images.
                  The backbone extraction module is responsible for extracting semantic information
                  features, and the branch mask extraction module is responsible for feature selection,
                  thereby improving the image retrieval capability of the model.
               
               The first part of the study utilizes deactivation mechanisms and activation functions
                  to improve the deep residual contraction network. The second part utilizes attention
                  mechanism and improves deep residual shrinkage network to construct a tourism image
                  retrieval model. The third part verifies the performance of the constructed model
                  for comment classification through simulation experiments and practical applications.
                  The fourth part summarizes the experimental results and analyzes the advantages and
                  disadvantages of the research methods used.
               
             
            
                  2. Related Works	
               With the rapid development and popularization of internet technology, people's demand
                  for tourism information is also growing day by day. Tourism images, as an intuitive
                  and rich information carrier, play an important role in tourism information retrieval.
                  However, how to efficiently retrieve and manage a large amount of tourism image data
                  remains a challenge. In recent years, the rapid development of deep learning technology
                  has provided new solutions for image processing, and numerous experts and scholars
                  have conducted relevant research. Scholars such as Su have proposed a marketing method
                  based on tourism image retrieval to enhance the development of ecotourism. This method
                  first collected tourist information using questionnaires, and then processed the images
                  of the involved scenic spots using the collected information. The results showed that
                  after processing the image using this method, the interest of tourists in ecotourism
                  could be significantly increased [7]. Maree M et al. designed a mobile recommendation system for precise multilingual
                  and multi standard semantics to enhance the retrieval experience of tourists towards
                  tourism services. The system could provide users with various image search functions
                  and utilize a database to detect recommendation systems. The results indicated that
                  the system could start finding corresponding image information based on the questions
                  of tourists [8]. Zheng et al. conducted a study on the emotional connection between tourism purpose
                  and purchase intention, based on the coupling theory, to examine the image of tourism
                  destinations. By conducting relevant searches on tourism purposes, it was possible
                  to generate certain expectations for tourists towards the destination property. The
                  study analyzed this through coupling theory. The results indicated a clear coordination
                  relationship between the evaluation values of tourism images and tourism figures [9]. Ageeva E has established a conceptual model based on tourism images and tourism
                  behavior to enhance the development of tourism. This model would analyze the image
                  of local brands in tourist destinations and the supply and demand of tourists to find
                  the correlation between the two. The results indicated that the model could provide
                  good planning for the appearance of tourists [10].
               
               To conduct research on image restoration, scholars such as Zhou D applied cyclic consistent
                  generative adversarial networks to image enhancement processing and established a
                  network model. This model utilized three paths to extract features, thereby solving
                  the problems of image color difference, features, and discrimination. The results
                  indicated that the image quality of the model has been significantly improved [11]. Yan et al. designed a remote sensing imaging coupled typical error source inversion
                  method to improve the imaging quality of optical systems. This method could effectively
                  process images using the modulation transfer function model and the decoupling principle
                  of coupling error sources. The results showed that the maximum relative error between
                  the inversion values of distorted remote sensing images coupled with typical error
                  sources and the true values was not more than 20%, and most of them were below 10%,
                  indicating good inversion performance [12]. To improve image denoising techniques, scholars such as Thakur RS have designed
                  a model based on Markov models combined with convolutional neural networks (CNN).
                  This model could effectively and quickly utilize deep shrinkage to process image noise.
                  The results indicated that the performance of these CNN models was analyzed on the
                  BSD-68 and Set-12 datasets. PDNN showed the best results in PSNR for the BSD-68 and
                  Set-12 datasets [13].
               
               In summary, the research on tourism image retrieval based on attention mechanism improved
                  deep residual shrinkage networks is of great significance. By integrating attention
                  mechanism with DRSN network, effective analysis and processing of tourism image retrieval
                  can be carried out to obtain the results of tourism image retrieval. The research
                  aims to provide stronger support for tourism image retrieval and the development of
                  the tourism industry.
               
             
            
                  3. Construction of a Tourism Image Retrieval Model Based on Improved DRSN Network	
               DRSN is a network with residual structure and self attention mechanism, which has
                  shown excellent performance in image feature extraction and semantic understanding.
                  However, there are still some issues with existing DRSNs, such as weak robustness
                  to noise and interference, and limited feature extraction capabilities. Therefore,
                  a tourism image retrieval model based on an improved DRSN network was constructed
                  to improve the performance and efficiency of tourism image retrieval.
               
               
                     3.1 Image Feature Extraction Based on Improved DRSN
                  With the rapid development of the Internet, people's willingness to obtain information
                     about tourist destinations through the Internet is becoming increasingly strong. The
                     display of tourist maps can provide information reference for tourists with travel
                     plans, thereby promoting the development of the tourism industry. But tourism images
                     also contain a lot of information, in addition to providing scenic spot information,
                     they also contain a lot of irrelevant information, which can interfere with the accuracy
                     of tourism image retrieval. Therefore, traditional DRSN designs excessively deep network
                     layers to filter out irrelevant information and add features of useful information.
                     Although this can to some extent reduce the feature extraction of irrelevant information,
                     it can also affect the performance of neural networks, leading to the disappearance
                     of network gradients and multi degree fitting, thereby affecting the accuracy of feature
                     extraction [14,15,16]. Based on this, the study utilizes random deactivation mechanism and activation function
                     to improve DRSN for image feature extraction. The random deactivation mechanism is
                     a strategy used in training neural networks, which reduces overfitting by randomly
                     deleting the outputs of some nodes during the training process. Specifically, for
                     each layer of neural network, a portion of nodes are retained and the remaining nodes
                     are deleted. During the iteration, after being processed by the deactivation mechanism,
                     the DRSN network structure underwent significant changes, and all network structures
                     can be treated as independent networks. The output result of DRSN can be regarded
                     as the sum of all independent network prediction results. The deactivation mechanism
                     is based on this to reduce the occurrence of overfitting and improve the robustness
                     of the network. The flowchart of deactivation mechanism processing is shown in Fig. 1.
                  
                  
                        
                        
Fig. 1. Flowchart for handling deactivation mechanisms.
                      
                  After using the deactivation mechanism to overfit the DRSN network, the activation
                     function can be used to further improve the DRSN network. To determine the effects
                     of different activation functions, two types of activation functions are selected
                     for comparison, namely parameter modified linear unit activation function and exponential
                     linear function. The definition of the parameter correction linear unit activation
                     function can be represented by formula (1).
                  
                  
                  In formula (1), $x_{i} $ represents the input of the $i$th channel in the parameter correction linear
                     unit activation function. $a_{i} $ represents the parameter value of controllable
                     negative half axis slope. The parameter correction linear unit activation function
                     can correct the system by adding a small number of parameters, thereby reducing the
                     risk of overfitting during the network fitting. Exponential linear functions can be
                     defined using formula (2).
                  
                  
                  In formula (2), $x_{j} $ represents the input of the $j$th channel in the exponential linear activation
                     function. By comparing formula (1) with formula (2), it is found that the positive and negative interval characteristics in exponential
                     linear activation functions have a wider adaptability. In positive intervals, it has
                     unsaturated characteristics, while in negative intervals, the exponential linear activation
                     function can take negative values. This indicates that when the output mean of the
                     activation function is around 0, the robustness of the model to noise does not decrease.
                     From this, the exponential linear activation function is more suitable for image feature
                     extraction than the parameter modified linear unit activation function. After determining
                     the activation function, the residual network of the DRSN network is superimposed.
                     After adding the activation function, there is a partial loss of irreversible information
                     in both the input and output processes of the entire network. If the lost information
                     contains some feature information, it will affect the performance of the network model.
                     Therefore, research is conducted to ensure that the deep and shallow networks in DRSN
                     networks are the same through identity functions, and residual results are used to
                     fuse the residual parts that appear in network layer transmission [17]. This enables the network structure to adapt more flexibly to various data distributions
                     and patterns. The schematic diagram of the residual module used in the study is shown
                     in Fig. 2.
                  
                  
                        
                        
Fig. 2. Schematic diagram of residual module.
                      
                  Based on the analysis in Fig. 2, the study utilized a 3-layer residual module, which consists of a combination of
                     2-layer $1\times1$ and 1-layer $3\times3$ convolutional layers. Reduce the dimensionality
                     of the input data was conducted through the convolution operation of the first layer
                     $1\times1$, that is, reduced 256 to 64, and then repaired it through the convolution
                     operation of the second layer $1\times1$. The residual structure can be represented
                     by formula (3).
                  
                  
                  In formula (3), $x_{l} $ represents shallow elements. $F$ represents residual unit. $W_{l} $ represents
                     deep units. By recursively inferring the residual structure, the characteristic expression
                     of any deep unit can be obtained, which can be represented by formula (4).
                  
                  
                  In formula (4), $x_{L} $ represents the characteristics of any deep unit. $\sum _{i=l}^{L-1}F $
                     represents residual unit. $W_{i} $ represents the value of deep units. By combining
                     formula (3) with formula (4), the formula for the sum of residual functions can be derived, which can be represented
                     by formula (5).
                  
                  
                  In formula (5), $x_{0} $ is the sum of the functions of all network layers. Through the above processing,
                     it can be ensured that the network gradient in the DRSN network always exists, thus
                     enabling feature recognition and extraction in tourism images.
                  
                
               
                     3.2 Design of a Tourism Image Retrieval Model Combining Attention Mechanism and Improved
                     DRSN
                  
                  Through analysis of the improved DRSN network model, it is found that the improved
                     DRSN mainly focuses on feature extraction and dimensionality reduction, and may lack
                     capture of image semantic information. The importance of semantic information in tourism
                     image retrieval is self-evident, therefore a method is needed to extract both image
                     features and semantic information simultaneously. Based on this, the study introduces
                     attention mechanism into an improved DRSN network for constructing a retrieval model
                     for tourism images [18,19]. The DRSN model that integrates attention mechanism can recognize the approximate
                     foreground position of target objects in tourism images. The attention module utilized
                     by the research institute consists of two parts, namely the backbone extraction module
                     and the branch mask extraction module. The backbone extraction module is feature extraction,
                     and the branch mask extraction module is feature selection. The output of the attention
                     module based on the backbone extraction module and the branch mask extraction module
                     can be represented by formula (6).
                  
                  
                  In formula (6), $M(x)$ represents the mask in branch extraction. $i$ represents all spatial positions
                     in the model. $c$ represents the model channel index. $T(x)$ represents the backbone
                     extraction output. The branch mask extraction module of attention in the DRSN model
                     can perform feature selection during the bidirectional propagation process, and can
                     also serve as a filter for gradient updating. At this point, the mask gradient of
                     the input feature can be represented by formula (7).
                  
                  
                  In formula (7), $\theta $ represents the parameter value of the branch mask. $\phi $ represents
                     the parameter value of the backbone branch. The characteristics of the mask can ensure
                     that the attention module is not affected by noise, while also preventing noise from
                     affecting the parameter extraction of the backbone branch gradient. However, through
                     research and development, it has been found that simple stacking in the DRSN model
                     can have a certain impact on the performance of the attention module. Therefore, the
                     study applies the identity function to the attention mask module, where the output
                     of the attention module can be updated to formula (8).
                  
                  
                  In formula (8), $F(x)$ represents the original features of the image. $F_{i,c} (x)$ represents residual
                     function. The range of $M(x)$ values is [0,1], where as $M(x)$ approaches 0, the output
                     of the attention module becomes closer to the original image features. Through the
                     above research, there are certain differences between the constructed attention residual
                     module and the original residual network in the model. The differences here can be
                     described using residual learning expressions, which can be expressed using formula
                     (9).
                  
                  
                  In the original residual network, $F_{i,c} (x)$ represents the residual function,
                     while in the attention residual module constructed in the study, $F_{i,c} (x)$ represents
                     the feature attention generated by the output of the convolutional network. The attention
                     layer residual module in the fusion of attention mechanism and DRSN network model
                     can use mask branches as feature selectors, thereby preserving the performance of
                     the backbone branch extraction module. At the same time, it can also quickly transfer
                     the original features of the image to the next layer to reduce the loss of feature
                     information. To enable the residual extraction module to collect more feature information
                     from the image, the attention residual module is added to the feature extraction stage
                     during the original feature extraction process [20]. The image feature extraction process based on the attention residual network model
                     is shown in Fig. 3.
                  
                  
                        
                        
Fig. 3. Image feature extraction flowchart based on attention residual network model.
                      
                  During the feature extraction process, each feature map will be compressed into a
                     feature value similar to a real number. This feature value contains all the information
                     on the corresponding image. By performing global pooling on these feature values,
                     a vector weight can be obtained, which can be represented by formula (10).
                  
                  
                  In formula (10), $H$ represents the height value of the image. $W$ represents the width value of
                     the image. $u$ represents the pooling result. $z$ represents the global attention
                     value. $c$ represents the weight coefficient. After obtaining vector weights, all
                     network layers can be activated using an exponential linear activation function, which
                     can be represented by formula (11).
                  
                  
                  In formula (11), $W_{1} z$ represents the activation operation of the entire network. $W_{2} $ represents
                     the network layer. The importance of each feature image can be calculated using formula
                     (11). Based on the above research analysis, the tourism image retrieval process diagram
                     of the fusion attention mechanism and improved DRSN constructed is shown in Fig. 4.
                  
                  
                        
                        
Fig. 4. Flow chart of tourism image retrieval based on integrating attention mechanism
                           and improving DRSN.
                        
                      
                
             
            
                  4. Performance Analysis of Tourism Image Retrieval Model Based on Improved DRSN	
               To verify the performance of the tourism image retrieval model, the Landscape-dataset
                  datasets were used to test the model. The performance of the tourism image retrieval
                  model based on improved DRSN was evidenced by training the model on the dataset.
               
               
                     4.1 Performance Analysis of Tourism Image Retrieval Models
                  To verify the performance of tourism image retrieval models, the study visualized
                     the image features extracted by the model at each stage to facilitate a more intuitive
                     understanding of the mechanism of neural network feature extraction. At the same time,
                     more image feature extraction results are used to analyze the feature capability of
                     the model. The visualization results of feature extraction using landscape samples
                     are shown in Fig. 5.
                  
                  
                        
                        
Fig. 5. Visualization results of feature extraction from landscape samples.
                      
                  From Fig. 5, it can be seen that the feature extraction effect of the research model after introducing
                     attention mechanism and other improvements is very obvious, while the non feature
                     areas are diluted to highlight the contour range of the features more prominently.
                     To verify the performance of the improved DRSN retrieval model trained on the dataset,
                     the Landscape-dataset(https://github.com/koishi70/Landscape-Dataset) were used to
                     test the model, and select 6000 images and divide them into two samples with higher
                     and lower complexity for testing, with a total of 3000 tourism images per sample.
                     Landscape Dataset is a large-scale natural landscape image dataset created and maintained
                     by developer Yuweiming70. The dataset contains tens of thousands of high-quality landscape
                     images, which are carefully labeled and classified according to different geographical
                     environments and weather conditions. Each category has a large number of samples,
                     ensuring diversity and generalization ability in training. The clear structure of
                     the dataset provides a resource for researchers in the fields of deep learning and
                     computer vision to train and test models, especially for tasks such as landscape classification
                     and object detection. The comparison results of the loss values of three methods on
                     two models are shown in Fig. 6.
                  
                  
                        
                        
Fig. 6. Comparison of loss values of three methods on two datasets.
                      
                  According to Fig. 6(a), it can be seen that in the Landscape-dataset sample 1, the loss value of the improved
                     DRSN changed relatively steadily after 51 iterations, with an average loss value of
                     0.21. When RNN and CNN iterated 49 and 47 times respectively, the fluctuations slowed
                     down, but the amplitude remained relatively large. The average losses of the two were
                     0.48 and 0.62, respectively. From Fig. 6(b), in the Landscape-dataset sample 2, the loss value of the improved DRSN became relatively
                     flat at 39 iterations and tended to stabilize at 157 iterations, with an average loss
                     value of 0.16. When RNN and CNN iterated 47 and 41 times respectively, the fluctuations
                     slowed down, but the amplitude remained relatively large. The average losses of the
                     two were 0.51 and 0.93, respectively. This indicated that the tourism image retrieval
                     model based on improved DRSN had higher robustness. In order to verify the accuracy
                     of the retrieval model in image retrieval in the dataset, the study also compared
                     the accuracy of image retrieval using the above methods. The comparison results of
                     image retrieval accuracy between three methods on two datasets are shown in Table 1.
                  
                  In Table 1, in the sample 1 test, the accuracy of the improved DRSN tended to stabilize after
                     118 iterations, with an accuracy of 94.52%. When RNN iterated 135 times, the accuracy
                     tended to stabilize, with an accuracy of 83.94%. When CNN iterated 157 times, the
                     accuracy region remained stable, with an accuracy of 72.52%. In the sample 2 test,
                     in the ResNet dataset, the image retrieval accuracy of the improved DRSN ess 90.88%,
                     while the image retrieval accuracy of RNN and CNN were 81.72% and 75.88%, respectively.
                     This indicated that in image retrieval capabilities, the image retrieval model based
                     on improved DRSN had higher accuracy and could increase the probability of images
                     being retrieved. In order to further validate the performance of the model in image
                     retrieval, the study compared image precision and recall as validation metrics. As
                     shown in Fig. 7, the comparison results of the precision and recall of three methods in the image
                     retrieval process are presented.
                  
                  
                        
                        
Table 1. Comparison of image retrieval accuracy of three methods on two datasets.
                     
                     
                           
                              
                                 | Algorithm | Data set | The number of iterations required to reach convergence (iterations) | Retrieval accuracy (%) | 
                           
                                 | CNN | Sample 1 | 157 | 72.52 | 
                           
                                 | Sample 2 | 182 | 75.88 | 
                           
                                 | RNN | Sample 1 | 135 | 83.94 | 
                           
                                 | Sample 2 | 164 | 81.72 | 
                           
                                 | Improved DRSN | Sample 1 | 118 | 94.52 | 
                           
                                 | Sample 2 | 106 | 90.88 | 
                        
                     
                   
                  
                        
                        
Fig. 7. Comparison results of precision and recall of three methods in two datasets.
                      
                  According to Fig. 7(a), all three methods had high performance in image retrieval for tourism images. The
                     image accuracy of the improved DRSN was 92.61%, while the image accuracy of RNN and
                     CNN were 88.95% and 86.13%, respectively. According to Fig. 7(b), all three methods had good image recall performance in the process of tourism image
                     retrieval. The image recall rate of improved DRSN was 96.48%, while the image recall
                     rates of RNN and CNN were 91.05% and 89.22%, respectively. This indicated that the
                     tourism image retrieval model based on improved DRSN had strong robustness and applicability.
                  
                
               
                     4.2 Application Performance Analysis of Tourism Image Retrieval Models
                  To verify the practical application performance of the tourism image retrieval model,
                     this study compared it with the Average Hash Algorithm (AHA). AHA converts an image
                     into a grayscale image, calculates the average brightness of the image, and then compares
                     the value of each pixel with the average brightness to generate a hash value, thereby
                     achieving image retrieval and maintaining high accuracy in efficient detection. The
                     volume of tourism data is huge, and the efficiency of the model is an important consideration
                     factor. The fast calculation speed and independence from image size of AHA make it
                     more efficient in processing large amounts of data, which is more in line with practical
                     applications. Therefore, AH was chosen as the comparative algorithm for the study.
                     In tourism images, different lighting conditions and occlusion could have a certain
                     impact on image retrieval and recognition. This study compared retrieval models with
                     AHA algorithms to verify the performance of models in tourism image retrieval in different
                     environments. As shown in Fig. 8, the recognition accuracy of two methods for the same tourism image in different
                     environments is shown.
                  
                  
                        
                        
Fig. 8. The recognition accuracy of two methods for the same tourism image in different
                           environments.
                        
                      
                  As shown in Fig. 8(a), under normal lighting conditions, the improved DRSN had the highest recognition
                     accuracy for tourism images, out of 300 images, 269 were accurately identified, with
                     an average accuracy of 89.51%, while AHA accurately identified 249 out of 300 images,
                     the average accuracy of AHA was 82.97%. As shown in Fig. 8(b), the image recognition accuracy of improved DRSN was also higher than that of AHA
                     for tourism images in dimly lit environments. The average accuracy of its retrieval
                     model was 59.64%, out of 300 images, 179 were accurately identified, while AHA accurately
                     identified 157 out of 300 images, the average accuracy of AHA was 52.33%. According
                     to Fig. 8 (c), it can be seen that in an occluded environment, both methods had a significant
                     decrease in the accuracy of image retrieval. This indicated that occlusions had a
                     significant impact on image retrieval, the accurate recognition numbers of the two
                     in 300 images are 124 and 116, respectively with accuracy rates of 41.25% and 38.66%,
                     respectively. To verify the impact of images captured from different angles on model
                     retrieval images, experiments were conducted in three different environments, and
                     the results are shown in Fig. 9.
                  
                  
                        
                        
Fig. 9. Comparison of ground cloud recognition results between two methods at different
                           observation angles.
                        
                      
                  In Fig. 9(a), the average recognition rate of the improved DRSN could reach 93.52% at a horizontal
                     angle, out of 500 images, 468 were accurately identified, while AHA accurately identified
                     444 out of 500 images, and the average recognition rate of AHA was 89.07%. As shown
                     in Fig. 9(b), the retrieval ability of the two methods for images slightly decreased at the elevation
                     angle. The average recognition rate of the improved DRSN was 79.68%, while the average
                     recognition rate of the traditional method was 73.14%, the accurate recognition numbers
                     of the two in 500 images are 400 and 366, respectively. As shown in Fig. 9(c), under vertical rotation, the image retrieval ability of the two significantly decreased,
                     the accurate recognition numbers of the two in 500 images are 216 and 201, respectively,
                     with recognition accuracy of 43.27% and 40.04%, respectively. This verified that the
                     retrieval model had high recognition ability under different shooting angles of images.
                     To further verify the operational capability of the retrieval model, the study used
                     retrieval time as a comparative indicator and compared the time consumption of improved
                     DRSN and AHA with the actual values. As shown in Fig. 10, the time consumption results of three methods in tourism image retrieval are shown.
                  
                  
                        
                        
Fig. 10. Comparison of time consumption of three methods in datasets with different
                           amounts of data.
                        
                      
                  According to the comparative analysis of Fig. 10(a), 10(b), and 10(c), it can be seen that the retrieval time for image retrieval of true values was 0.95s,
                     the retrieval time for improved DRSN was 1.28s, and the retrieval time for AHA was
                     1.46s. This indicated that in terms of time dimension, the difference between the
                     retrieval model and the true value was the smallest, with a difference of 0.27 seconds,
                     indicating that the retrieval model designed in the study could effectively retrieve
                     tourism images.
                  
                
             
            
                  5. Conclusion	
               To improve the retrieval and recognition ability of tourism images, an improved DRSN
                  based on attention mechanism was proposed, and this network was used in the model
                  design of image retrieval and recognition. The study first utilized deactivation mechanisms
                  and activation functions to improve the DRSN, and applied it to feature extraction
                  in tourism images. Then, based on the improvement of the network, attention mechanism
                  was introduced to construct a retrieval model. The results showed that under normal
                  lighting, low lighting, and occlusion conditions, the image retrieval accuracy of
                  the retrieval model was 89.51%, 59.64%, and 41.25%, respectively. In terms of horizontal
                  angle, pitch angle, and vertical rotation angle, the retrieval model's ability to
                  retrieve and recognize images was 93.52%, 79.68%, and 43.27%, respectively. The simultaneous
                  retrieval model's ability to recognize image retrieval took 1.28 seconds, which was
                  very close to the true value, with a difference of 0.27 seconds. Among all comparison
                  indicators, the performance of the retrieval model was superior to that of the comparison
                  method. This indicated that the proposed attention mechanism-based improved DRSN tourism
                  image retrieval method had significant advantages in improving retrieval accuracy
                  and efficiency. This method provided a new and effective solution for the field of
                  tourism image retrieval, which is of great significance for practical applications.
                  However, there were still certain shortcomings in the research. The study only searched
                  for images in some datasets. In the future, further exploration of the application
                  of this method in other datasets and verification of its performance and robustness
                  through more experiments can be conducted.
               
             
          
         
            
                  
                     REFERENCES
                  
                     
                        
                        C. C. Chiu, W. J. Wei, L. C. Lee, and J. C. Lu, ``Augmented reality system for tourism
                           using image-based recognition,'' Microsystem Technologies, vol. 27, no. 4, pp. 1811-1826,
                           2021.

 
                     
                        
                        S. Zulzilah, E. Prihantoro, and S. Masitoh, ``The image tourism destinations of Bandung
                           in social media network,'' International Journal of Multicultural and Multireligious
                           Understanding, vol. 6, no. 10, pp. 72-83, 2019.

 
                     
                        
                        M. Hasanvand, M. Nooshyar, E. Moharamkhani, and A. Selyari, ``Machine learning methodology
                           for identifying vehicles using image processing,'' Artificial Intelligence and Applications,
                           vol. 1, no. 3, pp. 170-178, 2023.

 
                     
                        
                        Y. Duan. J. Wang, H. Ma, and Y. Sun, ``Residual convolutional graph neural network
                           with subgraph attention pooling,'' Tsinghua Science and Technology, vol. 27, no. 4,
                           pp. 653-663, 2021.

 
                     
                        
                        Z. Yang, J. Shang, Z. Zhang, Y. Zhang, and S. Liu, ``A new end-to-end image dehazing
                           algorithm based on residual attention mechanism,'' Gongye Daxue Xuebao/Journal of
                           Northwestern Polytechnical University, vol. 39, no. 4, pp. 901-908, 2021.

 
                     
                        
                        H. Han, L. Zhuo, J. Li, J. Zhang, and M. Wang, ``Blind image quality assessment with
                           channel attention based deep residual network and extended LargeVis dimensionality
                           reduction,''  Journal of Visual Communication and Image Representation, vol. 80, no.
                           10, 103296, 2021.

 
                     
                        
                        X. Su, Q. Zheng, Q. Zheng, and W. Xu, ``Effects of environmental attractiveness and
                           tourism image cognition of ecotourism on customer satisfation,'' Journal of Environmental
                           Protection and Ecology, vol. 21. no. 2, pp. 783-789, 2020.

 
                     
                        
                        M. Maree, A. Rattrout, M. Altawil, and M. Belkhatir, ``Multi-modality search and recommendation
                           on Palestinian cultural heritage based on the holy-land ontology and extrinsic semantic
                           resources,'' Journal on Computing and Cultural Heritage (JOCCH), vol. 14. no. 3, pp.
                           1-23, 2021.

 
                     
                        
                        P. Zheng, J. Li, J. Wang, H. Cheng, and Q. Wang, ``The coupling coordination of relationships
                           between tourism destination image and product country image,'' International Journal
                           of Tourism Research, vol. 23, no. 5, pp. 858-870, 2021.

 
                     
                        
                        E. Ageeva and P. Foroudi, ``Tourists' destination image through regional tourism:
                           From supply and demand sides perspectives,'' Journal of Business Research, vol. 101,
                           pp. 334-348, 2019.

 
                     
                        
                        D. Zhou, Y. Qian, Y. Ma, Y. Fan, J. Yang, and F. Tan, ``Low illumination image enhancement
                           based on multi-scale CycleGAN with deep residual shrinkage,'' Journal of Intelligent
                           & Fuzzy Systems, vol. 42, no. 3, pp. 2383-2395, 2022.

 
                     
                        
                        J. Yan, M. Shi, X. Lv, Y. Zhang, and Y. Ma, ``An inversion method for coupled typical
                           error sources based on remote sensing image,'' Journal of Imaging Science & Technology,
                           vol. 66, no. 6, 060503, 2022.

 
                     
                        
                        R. S. Thakur, R. N. Yadav, and L. Gupta, ``State‐of‐art analysis of image denoising
                           methods using convolutional neural networks,'' IET Image Processing, vol. 13, no.
                           13, pp. 2367-2380, 2019.

 
                     
                        
                        W. Xie, M. Cui, M. Liu, P. Wang, and B. Qiang, ``Deep hashing multi-label image retrieval
                           with attention mechanism,'' International Journal of Robotics & Autoation, vol. 37,
                           no. 4, pp. 372-381, 2022.

 
                     
                        
                        L. Shan, M. Yu, J. Xia, J. Xin, C. Deng, and L. Zhu, ``Overlapped spectral demodulation
                           of fiber Bragg grating using convolutional time-domain audio separation network,''
                           Optical Engineering, vol. 62, no. 6, 066104, 2023.

 
                     
                        
                        E. V. Diana and M. Sumathi, ``An intelligent deep learning architecture using multi-scale
                           residual network model for image interpolation,'' Journal of Advances in Information
                           Technology, vol. 14, no. 5, pp. 970-979, 2023.

 
                     
                        
                        Q. Wang, J. Lai, Z. Yang, K. Xu, and L. Lei, ``Improving cross-dimensional weighting
                           pooling with multi-scale feature fusion for image retrieval,'' Neurocomputing, vol.
                           363, no. 10, pp. 17-26, 2019.

 
                     
                        
                        Y. Zhu, Y. Wang, H. Chen, Z. Zuo, and Q. Huang, ``Large-scale image retrieval with
                           deep attentive global features,'' International Journal of Neural Systems, vol. 33,
                           no. 3, pp. 13-30, 2023.

 
                     
                        
                        Y. Li, Z. He, Z. Zhang, W. Zhang, P. Chatterjee, and D. Pamucar, ``A novel feature
                           aggregation approach for image retrieval using local and global features,'' CMES-Computer
                           Modeling in Engineering & Sciences, vol. 131, no. 1, pp. 239-262, 2022.

 
                     
                        
                        Z. Wang, ``Video summarization generation with self-attention and random forest regression,''
                           Proc of Second International Symposium on Computer Applications and Information Systems
                           (ISCAIS 2023), SPIE, vol. 12721, no .6, pp. 349-356, 2023.

 
                   
                
             
            Author
            
            
               			Renbi Zhao graduated from Guangdon Polytechnic Normal University in 2002 with a
               bachelor’s degree in tourism management and service education. Currently, she holds
               the position of Dean of the School of Management at Guangdong Nanfang Institute of
               Technology with the title of Associate Professor. She is recognized as an Outstanding
               Teacher in Private Education in Guangdong Province, an Outstanding Young Science and
               Technology Pioneer in Jiangmen City, and an Outstanding Teacher in Jiangmen City.
               She has published over twenty papers in various national journals, with her research
               primarily focusing on tourism culture and tourism
               			resource development.