Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 14, No. 03, p.419-429

ISSN (online) :

2287-5255

Received : 13 August 2024Revised : 13 September 2024Accepted : 31 October 2024

DOI :

https://doi.org/10.5573/IEIESPC.2025.14.3.419

Regular Paper

Tourism Image Retrieval Method Based on Deep Residual Shrinkage Network

ZhaoRenbi¹^*

( School of Management, Guangdong Nanfang Institute of Technology, Jiangmen, 529000, China RenbiZhao@outlook.com)

^* Corresponding Author: Renbi Zhao, RenbiZhao@outlook.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

In response to the problem of high image similarity in traditional tourism image retrieval, a tourism image retrieval method based on attention mechanism improved deep residual shrinkage network was proposed. The study first utilized deactivation mechanisms and activation functions to improve the deep residual shrinkage network, and applied it to feature extraction in tourism images. Then, based on the improvement of the network, attention mechanism was introduced to construct a retrieval model. The results showed that the average accuracy and consistency of the retrieval model in image dimensionality reduction were 95.18% and 89.55%, respectively. At the same time, the training loss value and image retrieval accuracy in both datasets were 0.21, 0.16, and 94.52% and 90.88%, respectively. And the precision and recall of the image were 92.61% and 96.48%, respectively, which were better than the comparison method. This indicated that the retrieval model could effectively improve the feature representation and retrieval performance of tourism images, while enhancing its robustness to noise and interference. The research aimed to provide a new and effective solution for tourism image retrieval.

Keywords

Deep residual shrinkage network, Travel, Image retrieval, Attention mechanism

1. Introduction

With the rapid development of the Internet, people have generated a large amount of image data during the tourism process. Using these image data for tourism image retrieval can provide users with accurate and efficient information retrieval services, helping them better choose tourism destinations, plan tourism routes, etc. However, due to the particularity of image data, tourism image retrieval faces some challenges, such as high-dimensional features of images, richness of image semantics, and similarity measurement between images ^[1,^2]. To address these issues, research is based on the deep residual shrinkage network (DRSN), which is improved and then applied to tourism image retrieval. DRSN is an image feature extraction network based on deep learning, which has good feature learning and generalization capabilities ^[3,^4]. The study first utilizes deactivation mechanisms and activation functions to improve the DRSN network, enabling it to extract more image features. The study introduces a global average pooling layer and a novel activation function in DRSN networks to enhance the network's learning ability and generalization performance. Then, the attention mechanism will be incorporated into the improved DRSN network. The attention mechanism can enable the network to automatically learn the weights of different image features and focus more on the features related to the query image ^[5,^6]. By improving the DRSN network, the research aims to enhance the representation and retrieval performance of tourism image features. This study aims to provide a new and effective solution for tourism image retrieval to meet the needs of practical applications. The innovation of the research lies in the introduction of attention mechanism to enhance the ability of DRSN to capture semantic information of images. The backbone extraction module is responsible for extracting semantic information features, and the branch mask extraction module is responsible for feature selection, thereby improving the image retrieval capability of the model.

The first part of the study utilizes deactivation mechanisms and activation functions to improve the deep residual contraction network. The second part utilizes attention mechanism and improves deep residual shrinkage network to construct a tourism image retrieval model. The third part verifies the performance of the constructed model for comment classification through simulation experiments and practical applications. The fourth part summarizes the experimental results and analyzes the advantages and disadvantages of the research methods used.

2. Related Works

With the rapid development and popularization of internet technology, people's demand for tourism information is also growing day by day. Tourism images, as an intuitive and rich information carrier, play an important role in tourism information retrieval. However, how to efficiently retrieve and manage a large amount of tourism image data remains a challenge. In recent years, the rapid development of deep learning technology has provided new solutions for image processing, and numerous experts and scholars have conducted relevant research. Scholars such as Su have proposed a marketing method based on tourism image retrieval to enhance the development of ecotourism. This method first collected tourist information using questionnaires, and then processed the images of the involved scenic spots using the collected information. The results showed that after processing the image using this method, the interest of tourists in ecotourism could be significantly increased ^[7]. Maree M et al. designed a mobile recommendation system for precise multilingual and multi standard semantics to enhance the retrieval experience of tourists towards tourism services. The system could provide users with various image search functions and utilize a database to detect recommendation systems. The results indicated that the system could start finding corresponding image information based on the questions of tourists ^[8]. Zheng et al. conducted a study on the emotional connection between tourism purpose and purchase intention, based on the coupling theory, to examine the image of tourism destinations. By conducting relevant searches on tourism purposes, it was possible to generate certain expectations for tourists towards the destination property. The study analyzed this through coupling theory. The results indicated a clear coordination relationship between the evaluation values of tourism images and tourism figures ^[9]. Ageeva E has established a conceptual model based on tourism images and tourism behavior to enhance the development of tourism. This model would analyze the image of local brands in tourist destinations and the supply and demand of tourists to find the correlation between the two. The results indicated that the model could provide good planning for the appearance of tourists ^[10].

To conduct research on image restoration, scholars such as Zhou D applied cyclic consistent generative adversarial networks to image enhancement processing and established a network model. This model utilized three paths to extract features, thereby solving the problems of image color difference, features, and discrimination. The results indicated that the image quality of the model has been significantly improved ^[11]. Yan et al. designed a remote sensing imaging coupled typical error source inversion method to improve the imaging quality of optical systems. This method could effectively process images using the modulation transfer function model and the decoupling principle of coupling error sources. The results showed that the maximum relative error between the inversion values of distorted remote sensing images coupled with typical error sources and the true values was not more than 20%, and most of them were below 10%, indicating good inversion performance ^[12]. To improve image denoising techniques, scholars such as Thakur RS have designed a model based on Markov models combined with convolutional neural networks (CNN). This model could effectively and quickly utilize deep shrinkage to process image noise. The results indicated that the performance of these CNN models was analyzed on the BSD-68 and Set-12 datasets. PDNN showed the best results in PSNR for the BSD-68 and Set-12 datasets ^[13].

In summary, the research on tourism image retrieval based on attention mechanism improved deep residual shrinkage networks is of great significance. By integrating attention mechanism with DRSN network, effective analysis and processing of tourism image retrieval can be carried out to obtain the results of tourism image retrieval. The research aims to provide stronger support for tourism image retrieval and the development of the tourism industry.

3. Construction of a Tourism Image Retrieval Model Based on Improved DRSN Network

DRSN is a network with residual structure and self attention mechanism, which has shown excellent performance in image feature extraction and semantic understanding. However, there are still some issues with existing DRSNs, such as weak robustness to noise and interference, and limited feature extraction capabilities. Therefore, a tourism image retrieval model based on an improved DRSN network was constructed to improve the performance and efficiency of tourism image retrieval.

3.1 Image Feature Extraction Based on Improved DRSN

With the rapid development of the Internet, people's willingness to obtain information about tourist destinations through the Internet is becoming increasingly strong. The display of tourist maps can provide information reference for tourists with travel plans, thereby promoting the development of the tourism industry. But tourism images also contain a lot of information, in addition to providing scenic spot information, they also contain a lot of irrelevant information, which can interfere with the accuracy of tourism image retrieval. Therefore, traditional DRSN designs excessively deep network layers to filter out irrelevant information and add features of useful information. Although this can to some extent reduce the feature extraction of irrelevant information, it can also affect the performance of neural networks, leading to the disappearance of network gradients and multi degree fitting, thereby affecting the accuracy of feature extraction ^[14,^15,^16]. Based on this, the study utilizes random deactivation mechanism and activation function to improve DRSN for image feature extraction. The random deactivation mechanism is a strategy used in training neural networks, which reduces overfitting by randomly deleting the outputs of some nodes during the training process. Specifically, for each layer of neural network, a portion of nodes are retained and the remaining nodes are deleted. During the iteration, after being processed by the deactivation mechanism, the DRSN network structure underwent significant changes, and all network structures can be treated as independent networks. The output result of DRSN can be regarded as the sum of all independent network prediction results. The deactivation mechanism is based on this to reduce the occurrence of overfitting and improve the robustness of the network. The flowchart of deactivation mechanism processing is shown in Fig. 1.

Fig. 1. Flowchart for handling deactivation mechanisms.

After using the deactivation mechanism to overfit the DRSN network, the activation function can be used to further improve the DRSN network. To determine the effects of different activation functions, two types of activation functions are selected for comparison, namely parameter modified linear unit activation function and exponential linear function. The definition of the parameter correction linear unit activation function can be represented by formula (1).

(1)

$ f(x_{i} )=\left\{\begin{aligned} &x_{i} ,&& x_{i}>0,\\ &a_{i} x_{i} ,&& x_{i} \le 0. \end{aligned}\right. $

In formula (1), $x_{i} $ represents the input of the $i$th channel in the parameter correction linear unit activation function. $a_{i} $ represents the parameter value of controllable negative half axis slope. The parameter correction linear unit activation function can correct the system by adding a small number of parameters, thereby reducing the risk of overfitting during the network fitting. Exponential linear functions can be defined using formula (2).

(2)

$ f(x_{j} )=\left\{\begin{aligned} &x_{j} ,&& x_{j} >0,\\ &a(e^{x_{j} } -1),&& x_{j} \le 0. \end{aligned}\right. $

In formula (2), $x_{j} $ represents the input of the $j$th channel in the exponential linear activation function. By comparing formula (1) with formula (2), it is found that the positive and negative interval characteristics in exponential linear activation functions have a wider adaptability. In positive intervals, it has unsaturated characteristics, while in negative intervals, the exponential linear activation function can take negative values. This indicates that when the output mean of the activation function is around 0, the robustness of the model to noise does not decrease. From this, the exponential linear activation function is more suitable for image feature extraction than the parameter modified linear unit activation function. After determining the activation function, the residual network of the DRSN network is superimposed. After adding the activation function, there is a partial loss of irreversible information in both the input and output processes of the entire network. If the lost information contains some feature information, it will affect the performance of the network model. Therefore, research is conducted to ensure that the deep and shallow networks in DRSN networks are the same through identity functions, and residual results are used to fuse the residual parts that appear in network layer transmission ^[17]. This enables the network structure to adapt more flexibly to various data distributions and patterns. The schematic diagram of the residual module used in the study is shown in Fig. 2.

Fig. 2. Schematic diagram of residual module.

Based on the analysis in Fig. 2, the study utilized a 3-layer residual module, which consists of a combination of 2-layer $1\times1$ and 1-layer $3\times3$ convolutional layers. Reduce the dimensionality of the input data was conducted through the convolution operation of the first layer $1\times1$, that is, reduced 256 to 64, and then repaired it through the convolution operation of the second layer $1\times1$. The residual structure can be represented by formula (3).

(3)

$ x_{l+1} =x_{l} +F(x_{l} ,W_{l} ) . $

In formula (3), $x_{l} $ represents shallow elements. $F$ represents residual unit. $W_{l} $ represents deep units. By recursively inferring the residual structure, the characteristic expression of any deep unit can be obtained, which can be represented by formula (4).

(4)

$ x_{L} =x_{l} +\sum _{i=l}^{L-1}F (x_{i} ,W_{i} ) . $

In formula (4), $x_{L} $ represents the characteristics of any deep unit. $\sum _{i=l}^{L-1}F $ represents residual unit. $W_{i} $ represents the value of deep units. By combining formula (3) with formula (4), the formula for the sum of residual functions can be derived, which can be represented by formula (5).

(5)

$ x_{L} =x_{0} +\sum _{i=0}^{L-1}F (x_{i} ,W_{i} ) . $

In formula (5), $x_{0} $ is the sum of the functions of all network layers. Through the above processing, it can be ensured that the network gradient in the DRSN network always exists, thus enabling feature recognition and extraction in tourism images.

3.2 Design of a Tourism Image Retrieval Model Combining Attention Mechanism and Improved DRSN

Through analysis of the improved DRSN network model, it is found that the improved DRSN mainly focuses on feature extraction and dimensionality reduction, and may lack capture of image semantic information. The importance of semantic information in tourism image retrieval is self-evident, therefore a method is needed to extract both image features and semantic information simultaneously. Based on this, the study introduces attention mechanism into an improved DRSN network for constructing a retrieval model for tourism images ^[18,^19]. The DRSN model that integrates attention mechanism can recognize the approximate foreground position of target objects in tourism images. The attention module utilized by the research institute consists of two parts, namely the backbone extraction module and the branch mask extraction module. The backbone extraction module is feature extraction, and the branch mask extraction module is feature selection. The output of the attention module based on the backbone extraction module and the branch mask extraction module can be represented by formula (6).

(6)

$ H_{i,c} (x)=M_{i,c} (x)*T_{i,c} (x) . $

In formula (6), $M(x)$ represents the mask in branch extraction. $i$ represents all spatial positions in the model. $c$ represents the model channel index. $T(x)$ represents the backbone extraction output. The branch mask extraction module of attention in the DRSN model can perform feature selection during the bidirectional propagation process, and can also serve as a filter for gradient updating. At this point, the mask gradient of the input feature can be represented by formula (7).

(7)

$ \frac{\partial M(x,\theta )T(x,\phi )}{\partial \phi } =M(x,\theta )\frac{\partial T(x,\phi )}{\partial \phi } . $

In formula (7), $\theta $ represents the parameter value of the branch mask. $\phi $ represents the parameter value of the backbone branch. The characteristics of the mask can ensure that the attention module is not affected by noise, while also preventing noise from affecting the parameter extraction of the backbone branch gradient. However, through research and development, it has been found that simple stacking in the DRSN model can have a certain impact on the performance of the attention module. Therefore, the study applies the identity function to the attention mask module, where the output of the attention module can be updated to formula (8).

(8)

$ H_{i,c} (x)=(1+M_{i,c} (x))*F_{i,c} (x) . $

In formula (8), $F(x)$ represents the original features of the image. $F_{i,c} (x)$ represents residual function. The range of $M(x)$ values is [0,1], where as $M(x)$ approaches 0, the output of the attention module becomes closer to the original image features. Through the above research, there are certain differences between the constructed attention residual module and the original residual network in the model. The differences here can be described using residual learning expressions, which can be expressed using formula (9).

(9)

$ H_{i,c} (x)=x+F_{i,c} (x) . $

In the original residual network, $F_{i,c} (x)$ represents the residual function, while in the attention residual module constructed in the study, $F_{i,c} (x)$ represents the feature attention generated by the output of the convolutional network. The attention layer residual module in the fusion of attention mechanism and DRSN network model can use mask branches as feature selectors, thereby preserving the performance of the backbone branch extraction module. At the same time, it can also quickly transfer the original features of the image to the next layer to reduce the loss of feature information. To enable the residual extraction module to collect more feature information from the image, the attention residual module is added to the feature extraction stage during the original feature extraction process ^[20]. The image feature extraction process based on the attention residual network model is shown in Fig. 3.

Fig. 3. Image feature extraction flowchart based on attention residual network model.

During the feature extraction process, each feature map will be compressed into a feature value similar to a real number. This feature value contains all the information on the corresponding image. By performing global pooling on these feature values, a vector weight can be obtained, which can be represented by formula (10).

(10)

$ z_{c} =\frac{1}{H\times W} \sum _{i=1}^{H}\sum _{j=1}^{W}u_{c} (i,j) . $

In formula (10), $H$ represents the height value of the image. $W$ represents the width value of the image. $u$ represents the pooling result. $z$ represents the global attention value. $c$ represents the weight coefficient. After obtaining vector weights, all network layers can be activated using an exponential linear activation function, which can be represented by formula (11).

(11)

$ s=\sigma (W_{2} \delta (W_{1} z)) . $

In formula (11), $W_{1} z$ represents the activation operation of the entire network. $W_{2} $ represents the network layer. The importance of each feature image can be calculated using formula (11). Based on the above research analysis, the tourism image retrieval process diagram of the fusion attention mechanism and improved DRSN constructed is shown in Fig. 4.

Fig. 4. Flow chart of tourism image retrieval based on integrating attention mechanism and improving DRSN.

4. Performance Analysis of Tourism Image Retrieval Model Based on Improved DRSN

To verify the performance of the tourism image retrieval model, the Landscape-dataset datasets were used to test the model. The performance of the tourism image retrieval model based on improved DRSN was evidenced by training the model on the dataset.

4.1 Performance Analysis of Tourism Image Retrieval Models

To verify the performance of tourism image retrieval models, the study visualized the image features extracted by the model at each stage to facilitate a more intuitive understanding of the mechanism of neural network feature extraction. At the same time, more image feature extraction results are used to analyze the feature capability of the model. The visualization results of feature extraction using landscape samples are shown in Fig. 5.

Fig. 5. Visualization results of feature extraction from landscape samples.

From Fig. 5, it can be seen that the feature extraction effect of the research model after introducing attention mechanism and other improvements is very obvious, while the non feature areas are diluted to highlight the contour range of the features more prominently. To verify the performance of the improved DRSN retrieval model trained on the dataset, the Landscape-dataset(https://github.com/koishi70/Landscape-Dataset) were used to test the model, and select 6000 images and divide them into two samples with higher and lower complexity for testing, with a total of 3000 tourism images per sample. Landscape Dataset is a large-scale natural landscape image dataset created and maintained by developer Yuweiming70. The dataset contains tens of thousands of high-quality landscape images, which are carefully labeled and classified according to different geographical environments and weather conditions. Each category has a large number of samples, ensuring diversity and generalization ability in training. The clear structure of the dataset provides a resource for researchers in the fields of deep learning and computer vision to train and test models, especially for tasks such as landscape classification and object detection. The comparison results of the loss values of three methods on two models are shown in Fig. 6.

Fig. 6. Comparison of loss values of three methods on two datasets.

According to Fig. 6(a), it can be seen that in the Landscape-dataset sample 1, the loss value of the improved DRSN changed relatively steadily after 51 iterations, with an average loss value of 0.21. When RNN and CNN iterated 49 and 47 times respectively, the fluctuations slowed down, but the amplitude remained relatively large. The average losses of the two were 0.48 and 0.62, respectively. From Fig. 6(b), in the Landscape-dataset sample 2, the loss value of the improved DRSN became relatively flat at 39 iterations and tended to stabilize at 157 iterations, with an average loss value of 0.16. When RNN and CNN iterated 47 and 41 times respectively, the fluctuations slowed down, but the amplitude remained relatively large. The average losses of the two were 0.51 and 0.93, respectively. This indicated that the tourism image retrieval model based on improved DRSN had higher robustness. In order to verify the accuracy of the retrieval model in image retrieval in the dataset, the study also compared the accuracy of image retrieval using the above methods. The comparison results of image retrieval accuracy between three methods on two datasets are shown in Table 1.

In Table 1, in the sample 1 test, the accuracy of the improved DRSN tended to stabilize after 118 iterations, with an accuracy of 94.52%. When RNN iterated 135 times, the accuracy tended to stabilize, with an accuracy of 83.94%. When CNN iterated 157 times, the accuracy region remained stable, with an accuracy of 72.52%. In the sample 2 test, in the ResNet dataset, the image retrieval accuracy of the improved DRSN ess 90.88%, while the image retrieval accuracy of RNN and CNN were 81.72% and 75.88%, respectively. This indicated that in image retrieval capabilities, the image retrieval model based on improved DRSN had higher accuracy and could increase the probability of images being retrieved. In order to further validate the performance of the model in image retrieval, the study compared image precision and recall as validation metrics. As shown in Fig. 7, the comparison results of the precision and recall of three methods in the image retrieval process are presented.

Table 1. Comparison of image retrieval accuracy of three methods on two datasets.

Algorithm	Data set	The number of iterations required to reach convergence (iterations)	Retrieval accuracy (%)
CNN	Sample 1	157	72.52
CNN	Sample 2	182	75.88
RNN	Sample 1	135	83.94
RNN	Sample 2	164	81.72
Improved DRSN	Sample 1	118	94.52
Improved DRSN	Sample 2	106	90.88

Fig. 7. Comparison results of precision and recall of three methods in two datasets.

According to Fig. 7(a), all three methods had high performance in image retrieval for tourism images. The image accuracy of the improved DRSN was 92.61%, while the image accuracy of RNN and CNN were 88.95% and 86.13%, respectively. According to Fig. 7(b), all three methods had good image recall performance in the process of tourism image retrieval. The image recall rate of improved DRSN was 96.48%, while the image recall rates of RNN and CNN were 91.05% and 89.22%, respectively. This indicated that the tourism image retrieval model based on improved DRSN had strong robustness and applicability.

4.2 Application Performance Analysis of Tourism Image Retrieval Models

To verify the practical application performance of the tourism image retrieval model, this study compared it with the Average Hash Algorithm (AHA). AHA converts an image into a grayscale image, calculates the average brightness of the image, and then compares the value of each pixel with the average brightness to generate a hash value, thereby achieving image retrieval and maintaining high accuracy in efficient detection. The volume of tourism data is huge, and the efficiency of the model is an important consideration factor. The fast calculation speed and independence from image size of AHA make it more efficient in processing large amounts of data, which is more in line with practical applications. Therefore, AH was chosen as the comparative algorithm for the study. In tourism images, different lighting conditions and occlusion could have a certain impact on image retrieval and recognition. This study compared retrieval models with AHA algorithms to verify the performance of models in tourism image retrieval in different environments. As shown in Fig. 8, the recognition accuracy of two methods for the same tourism image in different environments is shown.

Fig. 8. The recognition accuracy of two methods for the same tourism image in different environments.

As shown in Fig. 8(a), under normal lighting conditions, the improved DRSN had the highest recognition accuracy for tourism images, out of 300 images, 269 were accurately identified, with an average accuracy of 89.51%, while AHA accurately identified 249 out of 300 images, the average accuracy of AHA was 82.97%. As shown in Fig. 8(b), the image recognition accuracy of improved DRSN was also higher than that of AHA for tourism images in dimly lit environments. The average accuracy of its retrieval model was 59.64%, out of 300 images, 179 were accurately identified, while AHA accurately identified 157 out of 300 images, the average accuracy of AHA was 52.33%. According to Fig. 8 (c), it can be seen that in an occluded environment, both methods had a significant decrease in the accuracy of image retrieval. This indicated that occlusions had a significant impact on image retrieval, the accurate recognition numbers of the two in 300 images are 124 and 116, respectively with accuracy rates of 41.25% and 38.66%, respectively. To verify the impact of images captured from different angles on model retrieval images, experiments were conducted in three different environments, and the results are shown in Fig. 9.

Fig. 9. Comparison of ground cloud recognition results between two methods at different observation angles.

In Fig. 9(a), the average recognition rate of the improved DRSN could reach 93.52% at a horizontal angle, out of 500 images, 468 were accurately identified, while AHA accurately identified 444 out of 500 images, and the average recognition rate of AHA was 89.07%. As shown in Fig. 9(b), the retrieval ability of the two methods for images slightly decreased at the elevation angle. The average recognition rate of the improved DRSN was 79.68%, while the average recognition rate of the traditional method was 73.14%, the accurate recognition numbers of the two in 500 images are 400 and 366, respectively. As shown in Fig. 9(c), under vertical rotation, the image retrieval ability of the two significantly decreased, the accurate recognition numbers of the two in 500 images are 216 and 201, respectively, with recognition accuracy of 43.27% and 40.04%, respectively. This verified that the retrieval model had high recognition ability under different shooting angles of images. To further verify the operational capability of the retrieval model, the study used retrieval time as a comparative indicator and compared the time consumption of improved DRSN and AHA with the actual values. As shown in Fig. 10, the time consumption results of three methods in tourism image retrieval are shown.

Fig. 10. Comparison of time consumption of three methods in datasets with different amounts of data.

According to the comparative analysis of Fig. 10(a), 10(b), and 10(c), it can be seen that the retrieval time for image retrieval of true values was 0.95s, the retrieval time for improved DRSN was 1.28s, and the retrieval time for AHA was 1.46s. This indicated that in terms of time dimension, the difference between the retrieval model and the true value was the smallest, with a difference of 0.27 seconds, indicating that the retrieval model designed in the study could effectively retrieve tourism images.

5. Conclusion

To improve the retrieval and recognition ability of tourism images, an improved DRSN based on attention mechanism was proposed, and this network was used in the model design of image retrieval and recognition. The study first utilized deactivation mechanisms and activation functions to improve the DRSN, and applied it to feature extraction in tourism images. Then, based on the improvement of the network, attention mechanism was introduced to construct a retrieval model. The results showed that under normal lighting, low lighting, and occlusion conditions, the image retrieval accuracy of the retrieval model was 89.51%, 59.64%, and 41.25%, respectively. In terms of horizontal angle, pitch angle, and vertical rotation angle, the retrieval model's ability to retrieve and recognize images was 93.52%, 79.68%, and 43.27%, respectively. The simultaneous retrieval model's ability to recognize image retrieval took 1.28 seconds, which was very close to the true value, with a difference of 0.27 seconds. Among all comparison indicators, the performance of the retrieval model was superior to that of the comparison method. This indicated that the proposed attention mechanism-based improved DRSN tourism image retrieval method had significant advantages in improving retrieval accuracy and efficiency. This method provided a new and effective solution for the field of tourism image retrieval, which is of great significance for practical applications. However, there were still certain shortcomings in the research. The study only searched for images in some datasets. In the future, further exploration of the application of this method in other datasets and verification of its performance and robustness through more experiments can be conducted.

REFERENCES

C. C. Chiu, W. J. Wei, L. C. Lee, and J. C. Lu, ``Augmented reality system for tourism using image-based recognition,'' Microsystem Technologies, vol. 27, no. 4, pp. 1811-1826, 2021.

S. Zulzilah, E. Prihantoro, and S. Masitoh, ``The image tourism destinations of Bandung in social media network,'' International Journal of Multicultural and Multireligious Understanding, vol. 6, no. 10, pp. 72-83, 2019.

M. Hasanvand, M. Nooshyar, E. Moharamkhani, and A. Selyari, ``Machine learning methodology for identifying vehicles using image processing,'' Artificial Intelligence and Applications, vol. 1, no. 3, pp. 170-178, 2023.

Y. Duan. J. Wang, H. Ma, and Y. Sun, ``Residual convolutional graph neural network with subgraph attention pooling,'' Tsinghua Science and Technology, vol. 27, no. 4, pp. 653-663, 2021.

Z. Yang, J. Shang, Z. Zhang, Y. Zhang, and S. Liu, ``A new end-to-end image dehazing algorithm based on residual attention mechanism,'' Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, vol. 39, no. 4, pp. 901-908, 2021.

H. Han, L. Zhuo, J. Li, J. Zhang, and M. Wang, ``Blind image quality assessment with channel attention based deep residual network and extended LargeVis dimensionality reduction,'' Journal of Visual Communication and Image Representation, vol. 80, no. 10, 103296, 2021.

X. Su, Q. Zheng, Q. Zheng, and W. Xu, ``Effects of environmental attractiveness and tourism image cognition of ecotourism on customer satisfation,'' Journal of Environmental Protection and Ecology, vol. 21. no. 2, pp. 783-789, 2020.

M. Maree, A. Rattrout, M. Altawil, and M. Belkhatir, ``Multi-modality search and recommendation on Palestinian cultural heritage based on the holy-land ontology and extrinsic semantic resources,'' Journal on Computing and Cultural Heritage (JOCCH), vol. 14. no. 3, pp. 1-23, 2021.

P. Zheng, J. Li, J. Wang, H. Cheng, and Q. Wang, ``The coupling coordination of relationships between tourism destination image and product country image,'' International Journal of Tourism Research, vol. 23, no. 5, pp. 858-870, 2021.

E. Ageeva and P. Foroudi, ``Tourists' destination image through regional tourism: From supply and demand sides perspectives,'' Journal of Business Research, vol. 101, pp. 334-348, 2019.

D. Zhou, Y. Qian, Y. Ma, Y. Fan, J. Yang, and F. Tan, ``Low illumination image enhancement based on multi-scale CycleGAN with deep residual shrinkage,'' Journal of Intelligent & Fuzzy Systems, vol. 42, no. 3, pp. 2383-2395, 2022.

J. Yan, M. Shi, X. Lv, Y. Zhang, and Y. Ma, ``An inversion method for coupled typical error sources based on remote sensing image,'' Journal of Imaging Science & Technology, vol. 66, no. 6, 060503, 2022.

R. S. Thakur, R. N. Yadav, and L. Gupta, ``State‐of‐art analysis of image denoising methods using convolutional neural networks,'' IET Image Processing, vol. 13, no. 13, pp. 2367-2380, 2019.

W. Xie, M. Cui, M. Liu, P. Wang, and B. Qiang, ``Deep hashing multi-label image retrieval with attention mechanism,'' International Journal of Robotics & Autoation, vol. 37, no. 4, pp. 372-381, 2022.

L. Shan, M. Yu, J. Xia, J. Xin, C. Deng, and L. Zhu, ``Overlapped spectral demodulation of fiber Bragg grating using convolutional time-domain audio separation network,'' Optical Engineering, vol. 62, no. 6, 066104, 2023.

E. V. Diana and M. Sumathi, ``An intelligent deep learning architecture using multi-scale residual network model for image interpolation,'' Journal of Advances in Information Technology, vol. 14, no. 5, pp. 970-979, 2023.

Q. Wang, J. Lai, Z. Yang, K. Xu, and L. Lei, ``Improving cross-dimensional weighting pooling with multi-scale feature fusion for image retrieval,'' Neurocomputing, vol. 363, no. 10, pp. 17-26, 2019.

Y. Zhu, Y. Wang, H. Chen, Z. Zuo, and Q. Huang, ``Large-scale image retrieval with deep attentive global features,'' International Journal of Neural Systems, vol. 33, no. 3, pp. 13-30, 2023.

Y. Li, Z. He, Z. Zhang, W. Zhang, P. Chatterjee, and D. Pamucar, ``A novel feature aggregation approach for image retrieval using local and global features,'' CMES-Computer Modeling in Engineering & Sciences, vol. 131, no. 1, pp. 239-262, 2022.

Z. Wang, ``Video summarization generation with self-attention and random forest regression,'' Proc of Second International Symposium on Computer Applications and Information Systems (ISCAIS 2023), SPIE, vol. 12721, no .6, pp. 349-356, 2023.

Author

Renbi Zhao

Renbi Zhao graduated from Guangdon Polytechnic Normal University in 2002 with a bachelor’s degree in tourism management and service education. Currently, she holds the position of Dean of the School of Management at Guangdong Nanfang Institute of Technology with the title of Associate Professor. She is recognized as an Outstanding Teacher in Private Education in Guangdong Province, an Outstanding Young Science and Technology Pioneer in Jiangmen City, and an Outstanding Teacher in Jiangmen City. She has published over twenty papers in various national journals, with her research primarily focusing on tourism culture and tourism resource development.