Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (School of Electronic Information and Electrical Engineering, Huizhou University, Huizhou 516007, China)
  2. (School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510641, China)
  3. (Foshan “Smart City” Joint Innovation Laboratory, Foshan, 528000, China)



Artificial intelligence, HCI, Computer-aided analysis, Bioinformatics

1. Introduction

With the constant development and innovation of smart technology, Smart Wearable Devices (SWD) and Brain-Computer Interface (BCI) technologies have become essential components of daily life and work [1]. The application of these technologies not only facilitates people’s lives, but also creates a more immersive experience space. Immersive Experiences (IE) refer to the experience of immersing the user in Multi-Source Information (MSI) inputs by means of virtual reality technology, augmented reality technology, etc. [2]. Compared to traditional information extraction technologies, SWD has the ability to comprehensively sense the user’s physical and behavioral characteristics, thereby enhancing the overall sense of immersion [3, 4]. BCI technology enables human-computer interaction through the reading and decoding of brain activity. This advancement allows users to interact with virtual environments more naturally, increasing the realism of IE [5]. Therefore, this study leverages SWD and BCI technologies to perceive and fuse the user’s MSI, comprising physiological indicators, movement data, and EEG signals. The objective is to enhance the IE’s realism and personalization, delivering a more immersive, natural, and comfortable experiential environment for the user. The research aims to enhance the immersion and personalization of the user experience by utilizing virtual reality, augmented reality, and other similar technologies. The main innovation of this study is to optimize movement and emotion recognition through MSI fusion, synthesize various physiological signal analyses, and introduce interactions between physiological signals and emotion and movement. This can fill the gap of poor integration of disciplines and solve the problem of insufficient quantitative evaluation of IEs. The main contribution of this research is to provide new insights into movement recognition and emotion analysis. It also advances the theoretical framework of physiological psychology and human-computer interaction design. By conducting a comprehensive analysis of the correlation between physiological signals and user behavior, it is possible to gain a deeper insight into the relationship between emotional states and movement. The system combining emotion recognition and movement monitoring can be applied in health monitoring, rehabilitation training and psychotherapy. Moreover, it can provide personalized feedback and support to patients to promote their recovery and treatment process. The first section of the study will present an objective description of relevant research and highlight the existing issues and challenges in implementing immersive perception and Multi-Source Information Fusion (MSIF) technology. The second section will outline the research methodology and implementation steps for fusing SWD and BCI technologies. Finally, the third section will provide empirical evidence to validate the effectiveness and feasibility of the developed system and technology through experimental studies. Finally, the fourth section will provide a summary of the study’s findings and discuss potential areas for future research. This study aims to investigate the various applications of SWD and BCI in fields like education and healthcare, offering technical support and insight in such areas.

2. Related Works

Traditional IE technology relies on external devices, such as joysticks and keyboards, for inputting information. This requires users to manually input commands to interact with the virtual environment. Algargoosh et al. used virtual reality as a platform in conjunction with SWD to analyze the impact of audiovisually consistent sound environments on the sense of human experience. The results showed that sound environments and audiovisual consistency amplified the intensity of emotional impact [6]. Patterson et al. proposed the use of portable, water-resistant, immersive virtual reality hardware as a non-drug aid measure to enhance safety and reduce care costs. The results showed that virtual reality could effectively reduce pain during standard debridement procedures [7]. Putra et al. proposed strategies for using virtual reality approaches to improve data quality and quantity. The results showed that the use of VR role-playing and autonomous driving methods could effectively improve the sense of participation of family participants, make the study process more enjoyable, and improve the quality and depth of data [8]. In response to the problem of students’ game innovation and creative learning process, Wong proposed the method of using "immersive" game environment to realize self-regulation learning practice. The research results showed that immersive games had a promising application prospect in game innovation education. The use of situational learning teaching method provided students with an effective discovery learning process [9]. Carlo et al. proposed a way to use virtual reality technology for training. By assessing participants’ ability to perform basic tasks in VR, real-world skill transfer, and individual characteristics, the study provided results on the effectiveness of VR training in this population. The results showed that the vast majority of participants successfully learned through VR, achieving reality transfer and generalization of skills, and the study identified a relationship between adaptive functioning and VR training success [10].

To cope with the challenges of synchronization as well as the high number of multiple fusion targets during image fusion, Wang et al. used an unsupervised learning model for synchronized multi-band image fusion to ensure the fusion of multi-level information. Experiments indicated the superiority and rationality of the unsupervised learning model [11]. Yang et al. proposed an MSI fusion model to evaluate the safety and reliability of bamboo integrated products. The results showed that the proposed method was more reliable than the traditional method. In general, this method provided a framework for safety and reliability evaluation under multiple data sources, and was conducive to the comprehensive application of diverse data [12]. Wang proposed a comprehensive evaluation method to cold-read the evaluation results by optimizing the fusion weights and processing the conflicting information sources. The experimental results showed that the proposed method could effectively isolate the trend term and the seasonal term, retain the outliers in the residual term, and successfully detect all obvious outliers at the data preprocessing stage, thus ensuring the high reliability of multi-source monitoring data [13]. Zhu et al proposed a new trust Renyi divergence for measuring differences between evidence in the Demster-Schaffer theory of evidence. Based on the proposed trust Renyi divergence measure, a novel MSIF method was designed. A comprehensive analysis and series of experiments was conducted to verify the practicality and efficacy of this method in MSIF [14]. Hua proposed an improved trust Hellinger divergence measure that takes full account of the uncertainty in the basic probability distribution to quantify the level of conflict between the evidence. Meanwhile, the reliability of MSIF strategy was determined based on external difference and internal ambiguity. The effectiveness of the proposed method was proved through fault diagnosis and iris dataset classification [15]. Song proposed an MSIF meta-learning network with convolutional block attention module. Through the designed multi-branch fusion structure, complementary and rich fault-related features in multi-source monitoring data could be fully extracted and utilized. The effectiveness and superiority of the proposed method were fully demonstrated through two bearing data sets covering multi-source monitoring data [16].

In summary, the integration of SWD and BCI technologies can enhance immersive perceptions and improve the processing of MSI data. This fusion can be applied not only to the IE of virtual worlds but also to provide technical support for decision-making. Therefore, the aim is to bring a more natural and convenient interactive experience to users.

3. Optimization Analysis of IE Spatial Awareness and MSIF Techniques

The study is based on the combination of SWD and BCI technologies, and starts from two aspects of enhancing the development of IMSIF systems and improving MSIF technologies, and analyzes how SWD and BCI play a role in spatial sensing and MSIF, respectively.

3.1. System Implementation of Spatial Sensing by Integrating Smart Helmet Devices and BCIs

With the continuous development of science and technology, people have become more demanding in their interaction with the environment [17]. IE spatial perception technology is a kind of cutting-edge technology that has emerged in this context [18]. It enables users to interact with computers in a more natural and intuitive way by interacting with virtual environments, thus achieving a more realistic experience feeling [19]. Fig. 1 shows common SWDs.

Fig. 1. Common intelligent wearable devices.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig1.png

Whether it is a smart watch, smart bracelet, smart glasses or smart helmet, they can be connected to smartphones and other devices through various built-in sensors. These can collect the user’s physiological data, behavioral data, etc. and transmit them to smartphones and other devices for processing. This can help users better understand their physical condition, location and other information for a smarter, more convenient life. Wavelet time-frequency analysis is usually used in signal pre-processing to identify and process the signal, in which the low-pass wave filter output equation of the signal is shown in Eq. (1).

(1)
$ x_{a,L}[n] = \sum_{K=0}^{K-1} x_{a-1,L}[2n-k]g[k]. $

In Eq. (1), $g$ is the low-pass filter. $n$ is the signal length. $x[n]$ is the sound signal. Eq. (1) is employed to describe the output of a low-pass filter, with the objective of eliminating high-frequency noise and preserving the low-frequency component of the signal. In the study, the equation can be utilized to distinctly identify and process physiological signals from SWDs, thereby ensuring that subsequent data analysis is based on a purer signal. The output equation of the corresponding high pass filter is shown in Eq. (2).

(2)
$ x_{a,H}[n] = \sum_{K=0}^{K-1} x_{a-1,H}[2n-k]h[k]. $

In Eq. (2), $h$ is the high pass filter. Moreover, $a$ is the order. Eq. (2) is employed for the purpose of removing low-frequency noise and retaining only the high-frequency components of the signal. This can assist in the examination and examination of rapidly transforming signal characteristics when processing data, such as instantaneous alterations in EMG signals. Then the output raw signal is weighted and summed and the equation for weighted sum is shown in Eq. (3).

(3)
$ y[n] = \sum_{i=0}^{N} b_i \cdot x[n-i]. $

In Eq. (3), $N$ is the number of filters. Moreover, $b_i$ is the coefficient of the $N$th filter at the $i$th moment. Eq. (3) is employed to assign weights to the filter output, thereby integrating data from multiple sensors. In the process of MSIF, the weighted sum of different signals serves to enhance the reliability and accuracy of the overall data. Where the expression of $b_i$ is shown in Eq. (4).

(4)
$ b_k = b_{n+2-k},~k = 1,~2,~\dots,~n+1. $

$k$ in Eq. (4) is the signal filtering. Eq. (4) describes the expression form of the signal after filtering to ensure that the necessary feature information can be retained in the signal processing. Among these wearable smart devices, smart helmets can track the user’s head position and line of sight, as well as recognize the user’s voice commands to achieve a more natural and intuitive interaction. Moreover, BCI technology can realize the control of external devices by decoding the brain activity signals, and control the SWD by intention, thus realizing a more natural and intuitive interaction with the surrounding environment. The reason for focusing on the integration of BCIs and SWDs is to enhance the naturalness of human-computer interaction, strengthen personalized and adaptive capabilities, and solve the problem of action limitations. The functionality of SWDs can be controlled by the user’s brain activity, with electrical signals being collected in real time via electrodes or other sensors attached to the scalp via a BCI. Following the appropriate pre-processing, the collected signal is then analyzed and decoded by a machine learning or signal processing algorithm. The decoded instructions are then transmitted to a SWD. Therefore, in order to improve the IE spatial perception technology, the study integrates the BCI technology and the smart helmet device, so that the user can feel the IE in the virtual environment more realistically. The pre-processing of the EEG signal in the BCI technology is one of the important steps, so the study adopts the spatial filtering algorithm to collect the characteristics of electrode signals, and its algorithm expression is shown in Eq. (5).

(5)
$ V_i^{CAR} = V_i^{ORIG} - \frac{1}{m} \sum_{j=1}^{m} V_j^{ORIG}. $

In Eq. (5), $V_i^{ORIG}$ is the original peak of the channel. and $m$ is the number of all channels selected. Eq. (5) is used to extract characteristic peaks from the electrode signals, and the key is to effectively identify the instantaneous peaks of brain activity from the processed EEG data. Then wavelet transform has been obtained for its signal feature recognition, the expression of wavelet transform as shown in Eq. (6).

(6)
$ W(a,b) = \int_{-\infty}^{\infty} x(t) \frac{1}{\sqrt{a}} \psi \left( \frac{t-b}{a} \right) dt. $

$x(t)$ in Eq. (6) is the signal presented by brain activity or external events. $\psi(t)$ is the wavelet substrate. $a$ and $b$ are the scale and translation variables, respectively. Eq. (6) is employed for the analysis of the instantaneous characteristics and frequency components of the signal, facilitating the effective capture of changes in transient signals. In the research, it enables an in-depth analysis of EMG, including dynamic movement features, and enhances the understanding of the user’s movement and psychological state. The final IE spatial perception system based on BCI and smart helmet is shown in Fig. 2.

Fig. 2. Immersive experience space perception system.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig2.png

Fig. 2 shows a framework for studying immersive spatial perception systems based on BCIs and smart helmets, in which BCIs play a key role in interpreting electrical signals from the brain directly into commands that can drive interactions in virtual environments. For example, by interpreting the brain’s response to specific visual or auditory stimuli, the BCI can “touch” or “move” virtual objects. A smart helmet is a hardware device that integrates an array of sensors that capture information about the movement and position of the head and transmit it to a processing unit. This allows the user’s movements and observations in the virtual environment to be mapped to the virtual world in real time. The system’s processing unit receives data from the BCI and the smart helmet and converts this data into interactive commands to the virtual environment through advanced signal processing algorithms. To provide a more realistic IE, the system provides immediate feedback, including visual feedback (e.g., 3D images), auditory feedback (e.g., sound effects or music), and haptic feedback (e.g., vibration or temperature changes). Moreover, the user interface enables users to interact with the system, including BCI’s hardware devices, smart helmets, and interfaces for entering commands and receiving feedback. The roadmap for the realization of one of these immersive spatial perception technologies is shown in Fig. 3.

Fig. 3. Immersive experience space perception technology.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig3.png

In spatial perception technology, BCI can capture electrical signals from the brain, which include but are not limited to visual, auditory, tactile, motor and cognitive. By interpreting these signals, BCI can determine the user’s intention and thus directly translate the user’s thinking into interactive commands to the virtual environment. The smart helmet, on the other hand, captures head movement and position information through built-in sensors and transmits this information to the processing unit in real time. The processing unit determines the user’s perspective and position based on this information, thus mapping the user’s observations in reality to the virtual environment. By doing so, BCI and the smart helmet together create a new, immersive spatial perception experience.

3.2. IMSIF Technology Improvement

With the continuous development of science and technology, people’s reliance on and demand for intelligent devices are increasing [20]. Especially in the information age, the acquisition, processing and utilization of information have become the key to various applications [21]. SWD and BCI, in addition to their application potential in spatial perception systems, can also satisfy people’s demand for more efficient, convenient and accurate information. In IE, compared with spatial perception, MSIF technology emphasizes more on the accuracy and comprehensiveness of situational perception. Therefore, the study integrates BCI and smart helmet to improve MSIF technology. In which the instantaneous sequence expression of MSIF technology to obtain information energy is shown in Eq. (7).

(7)
$ E(t) = S_i^2(t). $

In Eq. (7), $S_i(t)$ is the original signal. Eq. (7) is employed to delineate the transformation of signal energy, offering a pivotal contribution to the integration of multi-source data, which can elucidate the intensity of the user’s physiological condition within a particular context. Then the average energy of the signal is acquired by the sensor as shown in Eq. (8).

(8)
$ E_{avg}(t) = \frac{1}{W} \sum_{i=t-W+1}^{t} E(i). $

In Eq. (8), $W$ is the window length. $t$ is the time. In the modified MSIF technique the input and processing of joint motion signals is performed through sensors worn on various parts of the body, and the joint angle is calculated as shown in Eq. (9).

(9)
$ r = |a| + |b|. $

In Eq. (9), $a$ and $b$ are the data recorded by different sensors. Eqs. (8) and (9) assist the system in comprehending the user’s body posture and movement patterns. This is achieved by calculating data recorded by multiple sensors in order to obtain the average energy and joint angles. Then it is binarized and the result is shown in Eq. (10).

(10)
$ R_t = \begin{cases} 1, & p > G, \\ 0, & P < G. \end{cases} $

In Eq. (10), $p$ is the information real-time value. Moreover, $G$ is the judgment threshold. The binary operation of Eq. (10) is capable of extracting key information from complex data during the process of information processing. This is of particular importance for real-time monitoring and decision support. The technology roadmap for the MSI technology improvement process is shown in Fig. 4.

Fig. 4. Emg processing by multi-source information technology.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig4.png

In the improvement of MSI technology firstly, various physiological parameters of human body, such as heart rate, blood pressure, blood glucose, body temperature, EEG, etc., as well as environmental parameters, such as temperature, humidity, ultraviolet rays, air quality, etc., are acquired by SWD and BCI technology, and the information is acquired in real time by the smart sensors and BCI technology and transmitted to the processing center. In the information processing stage, the acquired data are cleaned, analyzed and processed by algorithms and models, mainly for noise reduction and optimization of Electro-Myographic Signals (EMS). After that, information from different sources and media are fused to produce more comprehensive and accurate information, and finally feedback information in the form of visual, auditory and tactile senses. Where the equation for processing EMS is shown in Eq. (11).

(11)
$ H(x) = \frac{1}{|\sqrt{\sigma}|} \int x(t)\psi * (\frac{t-\tau}{\sigma})dt. $

In Eq. (11), $x(t)$ is the signal. $\sigma$ is the scale parameter. $\tau$ is the time parameter. $\psi$ is the basis function. $*$ is the complex conjugate. Eq. (11) processes the EMG signal through the selected basis function, thereby enhancing the recognition rate of signal features. This is a pivotal step in the processes of movement recognition and rehabilitation training. The frequency optimization function is taken in optimizing the features of EMS and its expression is shown in Eq. (12).

(12)
$ \sum_{j=0}^{N-1} |x[j]|^2 = \frac{1}{N} \sum_{K=0}^{N-1} |X[K]X^*[K]| = \sum_{j=0}^{N-1} p[k]. $

In Eq. (12), $p[k]$ is the power spectrum. $k$ is the spectral coefficient. $N$ is the signal length. The frequency optimization of Eq. (12) can improve the effectiveness of target recognition in signal analysis, especially in processing rapidly changing physiological signals, and ensure accuracy. The frequency optimization of Eq. (12) can improve the effectiveness of target recognition in signal analysis, especially in the processing of rapidly changing physiological signals, and ensure accuracy. Based on the above analysis, this study explores the use of SWDs and BCI for data collection. The collected original signal is denoised and feature extracted, and the data is cleaned by filtering and transforming. High-frequency noise is removed using wavelet transforms and filtering of existing signals to improve signal quality, e.g., low-pass and high-pass filtering using Eqs. (1) and (2) to ensure that subsequent analysis are based on good signals. By analyzing the EEG and EMG signals, the features related to the user’s emotional state and action are extracted. In the feature extraction stage, bandpass filters and wavelet transform are used to ensure that important dynamic features in the signal can be captured. The information obtained from different sensors is fused to form a comprehensive information input. In this process, the data must be weighted to improve the accuracy and credibility of the data, and Eq. (11) is used to extract the EMG features, and Eq. (12) is used to optimize the frequency. By optimizing the frequency coefficient, the noise component can be effectively removed, thereby ensuring that the recognition rate can be maintained in the context of rapidly changing physiological signal processing and improving the performance of action recognition.

Then, the extracted data of all parties are entered into the data set and classified. In which the EEG signal acquisition for BCI is performed, the distribution of brain electrodes is shown in Fig. 5.

Fig. 5. Distribution of brain electrodes.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig5.png

In MSIF technology, brain electrodes are mainly used for signal acquisition, which captures electrical signals from the brain directly. These signals include, but are not limited to, visual, auditory, tactile, motor, and cognitive. Specifically, brain electrodes record the activity of neurons in the brain to obtain information about cognitive, emotional, motor, and other neural activities. In MSIF, these EEG signals can be fused with data from other types of sensors, such as health monitoring data from sensors worn on various parts of the body, various environmental parameters from smart homes, and so on. Through this fusion, a more comprehensive and accurate judgment and understanding of the user’s state and needs can be made from multiple perspectives and levels. The primary objective of the IMSIF technology is to enhance the precision and dependability of data, fortify the comprehensive comprehension of knowledge, and facilitate real-time responsiveness and feedback. By integrating data from disparate sensors, the IMSIF technology can eliminate noise and errors that may be introduced by a single data source, thereby improving the accuracy and reliability of data. The system is capable of forming a more comprehensive understanding of the user’s state through the fusion of multiple data sources. This multidimensional data analysis method facilitates a more comprehensive understanding of the user’s situational perception and psychological state. The IMSIF technology is capable of processing data from disparate sensors, expeditiously generating response commands, monitoring the user’s physiological signals in real time, and adjusting the training plan based on the data analysis results to align with the individual needs of the user.

4. Device Fusion-based Spatial Sensing and MSIF Technology Applications

In order to verify the applicability and existence of advantages of the proposed spatial perception system as well as the MSIF technique, the study analyzes both of them experimentally respectively.

4.1. Effectiveness Analysis of IMSIF System

In the experimental analysis, experiment used OpenBCI as an open source hardware and software toolbox for EEG acquisition, which is capable of connecting multi-channel EEG sensors to record the electrical activity of the brain in real time and visualize the signals through a graphical interface. The MATLAB Data Acquisition Toolbox makes it easy to read and monitor sensor data in real time. The visualization tool uses Matplotlib to clearly display the perceptual results of different dimensions, the accuracy of action recognition, and the effect of emotion classification. After the data are obtained, the signal is divided according to the time window, and the data in different time windows are processed, and the time-frequency features are extracted by using wavelet transform or Fourier transform. The data from different sources are normalized, the basic features of the signal are calculated, and the features are standardized to facilitate model training and testing.

Before the experiment, the BCI competition data are selected, and five experimenters are selected for the acquisition of EEG data [22]. The acquisition location is the forehead of the brain, the waveform is 0-100 Hz, the emission frequency of the signal is 2 s, the data precision is set to 5 bits, and the hardware response time is less than 2 ms. The subjects’ emotional data is monitored and collected through physiological signals such as heart rate variability and skin electrical response. The emotional data is about 2,440 pieces of data in total. The head movement data set is collected by motion sensors, including the angle changes of movements such as head up, head down, and head turning, and then used to analyze the user’s perspective and focus shift. The basic information of the five subjects is as follows: the average age is 37.2 years old, the oldest age is 45 years old, and the youngest age is 32 years old. 3 males and 2 females. All subjects has no major illnesses or physical impairments, and no major surgeries is recorded in the last six months. 3 subjects exercises regularly and 2 subjects exercises occasionally. In the experiment, the study selects specific subjects for assessment tests, including mental state assessment, movement recognition, physiological data monitoring, experience in a virtual environment, and MSIF. The fundamental principles of mental state assessment are primarily based on psychological and neuro-scientific theories. The use of physiological monitoring technology, such as EEG, enables the observation of the electrical activity of the brain. This observation can provide insight into the emotional and cognitive status of individuals. Based on the principles of human-computer interaction and sports science, movement recognition helps users interact more naturally with the virtual environment and provides important feedback information for sports training and rehabilitation. Physiological data monitoring is based on biomedical and psychophysiological principles, and physiological data is often closely related to the mental state and behavior of an individual. The experience in the virtual environment is to study the integration effect of SWDs and BCI technology. The choice of MSIF is to obtain more comprehensive user status information and improve decision support by integrating information from different sources. Then, the system is tested, and the four psychological states of the experimenters are selected for the analysis of the band components of the EEG waves, and the results of the analysis are shown in Table 1. The analysis results are shown in Table 1.

Table 1. Analysis of EEG wave band characteristics.

Rhythm category

Frequency (Hz)

Psychology

Obvious parts

Alpha wave

(8,17)

Clear, quiet, and closed eyes

Occipital and parietal lobes

Beta wave

(13,31)

Cortical excitation

Anterior central gyrus and frontal lobe

Theta wave

(4,37)

Sleep

Parietal and temporal lobes

Delta wave

(0.5,200)

Deep sleep

Frontal and temporal lobes

Table 1 shows the characteristics of each band of the EEG of the test subjects in different mental states, when which the frequency of Alpha in the occipital and parietal parts is 8-17 Hz when the subjects are in the awake, quiet, and closed-eye state, the frequency of Beta in the frontal lobe of the anterior central round is 13-31 Hz when the subjects are in the excitatory state of the cortex, and the frequency of Parietal and Temporal lobe of the parietal lobe and temporal lobe of the temporal lobe is 4-37 Hz when the subjects are in the sleep state. Theta’s frequency is 4-37 Hz, and Delta’s frequency in the frontal and temporal lobes is 0.5-200 Hz when the subject is in a deep sleep state. Its test results match the actual situation, which shows that the system of this research can effectively analyze the psychological state of the subjects. Then the subjects are immersed in the virtual world experience using the BIC competition data and smart helmets, after which the six dimensions of the sense of presence are utilized to score the immersive perception results of the subjects. The results are shown in Fig. 6.

Fig. 6. Immersion perception scores of subjects in different dimensions.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig6.png

In Fig. 6, different dimensions correspond to circles of different colors, and the size of the circle indicates the score size of the subject’s corresponding dimension. The larger the circle, the larger the score of the corresponding dimension. Subject 1 has the highest ratings for perception and attention, and subject 2 has higher ratings for self-awareness, social interaction, and cognition in the experiment. Subject 3 has a higher level of overall cognition and perception of the scenario. Subject 4 scores higher in perception and emotional response while the last subject scores higher in cognitive level. The results show that the proposed method can understand the user’s emotional state in the IE by analyzing different dimensions. It can be customized according to the user experience to improve the adaptability and flexibility of the system. Next, the accuracy of response movement recognition of subjects wearing smart helmets is tested to verify the sensitivity of the system to human perception. A total of three actions, head down and head up and head turn, are designed in the virtual scenario in the experiment, and the final test results are shown in Fig. 7.

Fig. 7. Accuracy analysis of subject’s head movement recognition.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig7.png

Fig. 7 illustrates the subject’s head movement recognition accuracy. The box figure represents the average recognition accuracy of the movement, while the upper and lower lines represent the standard deviation of the average accuracy. Among them, the average accuracy rate of head-up movement recognition is 95.6%, the average accuracy rate of head-down movement recognition is 91.3%, and the average accuracy rate of head-down movement recognition is 82.2%. Moreover, the difference between the recognition accuracy of each subject is within 5%. The findings demonstrate that the proposed method is capable of adjusting the feedback in a timely manner within a dynamic environment. It is also able to swiftly and accurately comprehend the user’s intention and effectively translate the user’s natural actions into a prompt system response, thereby enhancing the sense of immersion. Then the brainwave characterization of BCI is used to analyze the user’s emotional and mental state to verify the immersive perception of the research user, and the verification method is analyzed by confusion matrix, and the analysis results are shown in Fig. 8.

Fig. 8. Confusion matrix analysis of emotion recognition model.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig8.png

In the emotion recognition of the subjects in the scenario experience, in which the highest accuracy of emotion recognition is happy, with an accuracy of 97.3%, followed by the emotions of fear and anger, with accuracy of 96.8% and 95.4%, respectively, and the predictive accuracy of the emotion perception of sadness and peace are all around 90%. It can be concluded that this scenario test, the research system is able to sensitively recognize the emotional changes of each subject, and the accuracy of the recognition is high. The results show that the method can capture the user’s emotional state in real time, provide personalized feedback to the user, and achieve emotional interaction.

4.2. Performance Analysis of IMSIF Technology

In order to test the performance of MSIF technology that incorporates SWD and BCI, sensory information is collected through sensors and data pre-processing is performed. Three feature sets of odor, smell and vision are selected for the study and the samples are tested using multivariate statistical analysis for analyzing the sensitivity of the IMSIF technology to the sensory samples, and the results of the experiments are shown in Fig. 9.

Multivariate statistical analysis results: perceptual sample feature classification.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig9.png

Fig. 9(a) the feature classification results show that the perceptual samples are clustered within the group and the combing between the groups is obvious, and it is verified by the replacement test in Fig. 9(b), the variable intercepts are less than 0. The results show that the system does not have the overfitting problem, and the classification distances between the variables are small, which indicates that the system is more sensitive to the features of the perceptual information. The score of R2 of the system is 0.91 and the score of Q2 is 0.89. The results show that high intra-group clustering and clear inter-group separation indicate that the selected features have efficient ability to discriminate situational perception. Meanwhile, the multivariate statistical analysis method enhances the model’s ability to process multidimensional data, so that the system can maintain good performance in different scenarios and provide guarantee for subsequent complex real-time data processing. Then the experiment is conducted to collect action data and perception data from the subjects. Based on the classifier, the different data collected are filtered and noise reduction as well as pre-processing, and the data features are classified. The classification results are shown in Fig. 10.

Fig. 9. 3D visualization of feature recognition: classification of motion data and perception data.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig9.png

Fig. 10 shows that in the researched technique can effectively analyze the physiological and behavioral signals of the user and shows the changes in the user’s movements in three dimensions and process and classify the features of different perceptual and movement data, compared to the human eye, the researched MSIF technique can objectively evaluate and classify the user’s movements. The results demonstrate that the proposed method is capable of providing an intuitive and comprehensive analysis, as well as enabling dynamic monitoring and evaluation of users.

Furthermore, it provides the requisite visual basis for the improvement of training or treatment strategies. Then the predicted real values of the improved MSIF technology and the MSIF technology before the improvement are compared, and the advantages of the Improved Multi-Source Information Fusion (IMSIF) technology have been verified, and the comparison results are shown in Fig. 11.

Fig. 11(b) displays the analysis of predicted values for the improved MSIF technique, indicating a prediction accuracy of 0.98. This accuracy value is 27% higher compared to the pre-improved MSIF technique. The findings demonstrate that the enhanced method is more efficacious in addressing data processing scenarios that necessitate high accuracy and reliability. Furthermore, it plays a pivotal role in guiding the practical application of such scenarios.

To further verify the advanced nature of the proposed method, Transformer model, Long Short-Term Memory network model (LSTM), and Self-Attention network model (SA) are compared in this study. Moreover, the model prediction accuracy, recall rate, F1 score and processing time are evaluated. Specific results are shown in Table 2.

Table 2. Improved performance comparison of multi-source information fusion.

Index

IMSIF

Transformer

LSTM

SA

Accuracy rate

0.935

0.883

0.905

0.913

Recall rate

0.913

0.834

0.878

0.892

F1 score

0.922

0.857

0.892

0.901

Processing time

150 ms

220 ms

260 ms

230 ms

Table 2 results illustrates that in terms of accuracy, IMSIF is 0.935, Transformer is 0.883, LSTM is 0.905, and SA is 0.913. The results display that although the comparison model has certain performance in multi-source data fusion tasks, it lacks adaptability in processing complex user interaction data. In terms of recall rate, IMSIF recall rate reaches 0.913, which has obvious advantages compared with the comparison algorithm, indicating that the technology can capture user status changes while reducing the leakage of processing. In terms of F1 scores, IMSIF reaches 0.922, indicating that the method can effectively deal with the challenge of data imbalance. In terms of processing time, compared with Transformer, LSTM and SA, IMSIF method shortens 70ms, 110ms and 80ms respectively. It indicates that this method can meet the response requirements of real-time interaction.

5. Discussion

In the results of the research experiment, the EEG data of the subjects in different mental states showed that the system could effectively identify different EEG frequency bands, which was consistent with the psychological state of the subjects. Based on the data of the smart helmet, the movement detection accuracy of the subjects reached more than 90%. The detection accuracy of the emotional state was high, and the detection accuracy of happiness was up to 97.3%. The prediction accuracy of the IMSIF technology reached 0.98. As for the correlation between immersion and mental state, when the subjects were in a relaxed and happy emotional state, the system was more sensitive to the analysis of EEG, and could efficiently detect changes in alpha and beta waves. The underlying reason may be that there was a close link between physiological states and brain activity, which provided support for biofeedback mechanisms. When users feel immersed, the changes in their physiological indicators exhibited greater stability and consistency. In terms of movement recognition results, the smart helmet could obtain more comprehensive movement information by using the fusion of a variety of sensors. Moreover, the method adopted in the study had strong adaptability in processing and classifying movement data. In terms of sensitivity to emotion recognition, the researchers improved the recognition accuracy of emotion changes through a fully trained machine learning model. Moreover, the use of MSI could effectively enhance the accuracy of emotion recognition. In the same type of study, Daas proposed a secure multimodal biometrics system based on a deep learning approach. The experimental results showed that the proposed fusion architecture achieved 99.89% accuracy and 0.05% equal error rate. The obtained results showed that the biometrics system based on deep learning was safe, robust and reliable [23]. Tyagi al. used facial features and finger vein features for identification and authentication. Deep convolutional neural networks were used to extract features from facial and digital veins. The fusion process involved combining extracted features from the two modes at the score level. The experimental results of all the considered public databases showed significant improvements in the accuracy of identification and authentication, as well as equal error rates [24]. In comparison to the research content, the intelligent method is beneficial for enhancing head movement and emotion recognition, and the findings are consistent.

Fig. 10. Effect of multi-source information fusion technology on movement recognition before and after improvement.

../../Resources/ieie/IEIESPC.2026.15.1.123/fig10.png

6. Conclusion

Traditional IE technology does not analyze users’ physiology and movement recognition. This research introduced a novel spatial perception system for IE by combining SWD and BCI techniques. The MSIF technology accurately detected the user’s state and environment. Experimental validation confirmed the system’s effectiveness in analyzing various psychological states in subjects. The immersive spatial perception system in this study enabled users to experience virtual space immersion. In the recognition of head movements, the average recognition accuracy for movements with head up was 95.6%, whereas it was 91.3% for movements with head down and 82.2% for movements with head tilted left or right. The recognition accuracy difference among subjects was within 5%. For the recognition of emotions, the highest accuracy was achieved for the happy emotion with 97.3% accuracy. The IMSIF technology system achieved a score of 0.91 for R2 and 0.89 for Q2 in the classification performance analysis. Furthermore, the improved MSIF technology displayed a prediction accuracy of 0.98, representing a 27% increase in prediction accuracy compared to the pre-improved system. In summary, the proposed system and improvement techniques can provide users with a realistic, natural, efficient, safe and comfortable IE, which holds great application value and potential. However, the study’s limitation is the lack of research on individuals’ EMS differences, indicating a need for further exploration in the follow-up. The shortcoming of this study is the lack of research on the differences in EMG between individuals. Future studies can consider how to adapt the immersion experience to individual differences of users. The construction of personalized models is made possible by the collection and analysis of additional user data, which in turn enables the provision of immersive, interactive environments that are more tailored to individual needs. As smart devices continue to evolve, the fusion of multimodal data will play a greater role in immersive experiences. The research can further explore how to efficiently integrate this multi-source data to improve the accuracy and application range of data fusion.

Funding

The research is supported by The Natural Science Foundation of Guangdong Province (No.2022A1515140120).

References

1 
Fang Y. , Liu H. , Wang J. , Li Z. , Zhang Q. , 2022, ST-SIGMA: spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting, CAAI Transactions on Intelligence Technology, Vol. 7, No. 4, pp. 744-757DOI
2 
Yang Y. , Song X. , Li J. , Chen H. , 2022, Research on face intelligent perception technology integrating deep learning under different illumination intensities, Journal of Computational and Cognitive Engineering, Vol. 1, No. 1, pp. 32-36DOI
3 
Lu K. , Karlsson J. , Dahlman A. S. , Sjöqvist B. A. , Candefjord S. , 2022, Detecting driver sleepiness using consumer wearable devices in manual and partial automated real-road driving, IEEE Transactions on Intelligent Transportation Systems, Vol. 5, pp. 23-34DOI
4 
Yu Q. , Zhang H. , Wang L. , Chen Y. , 2022, Ti3C2Tx MXene/polyvinyl alcohol decorated polyester warp knitting fabric for flexible wearable strain sensors, Textile Research Journal, Vol. 92, No. 6, pp. 810-824Google Search
5 
Kim S. , Lee J. , Park H. , Choi M. , 2023, Designing an XAI interface for BCI experts: a contextual design for pragmatic explanation interface based on domain knowledge in a specific context, International Journal of Human-Computer Studies, Vol. 5, No. 12, pp. 235-246DOI
6 
Algargoosh A. , Al-Kodmany M. , Hussein S. , 2022, The impact of the acoustic environment on human emotion and experience: a case study of worship spaces, Building Acoustics, Vol. 29, No. 1, pp. 85-106DOI
7 
Patterson D. R. , Hoffman E. , Carrougher M. , Sharar S. , 2023, A comparison of interactive immersive virtual reality and still nature pictures as distraction-based analgesia in burn wound care, Burns, Vol. 49, No. 1, pp. 182-192DOI
8 
Putra P. G. , Rahman A. , Nugroho S. , Hidayat T. , 2024, Virtual reality as an immersive projective and autodriving advancement technique, Journal of Consumer Behaviour, Vol. 23, No. 2, pp. 711-726DOI
9 
Wong L. K. , 2023, Learning game innovations in immersive game environments: a factor analytic study of students’ learning inventory in virtual reality, Virtual Reality, Vol. 27, No. 3, pp. 2331-2339DOI
10 
Carlo M. S. , Bianchi R. , Rossi L. , Conti F. , 2023, Improving real-world skills in people with intellectual disabilities: an immersive virtual reality intervention, Virtual Reality, Vol. 27, No. 4, pp. 3521-3532DOI
11 
Wang B. , Li Y. , Zhang H. , Liu Q. , 2022, LIALFP: multi-band images synchronous fusion model based on latent information association and local feature preserving, Infrared Physics & Technology, Vol. 120, pp. 232-254DOI
12 
Yang L. , Zhang J. , Li H. , Wang X. , 2020, Multi-granulation method for information fusion in multi-source decision information system, International Journal of Approximate Reasoning, Vol. 122, pp. 47-65DOI
13 
Wang L. , Zhao Y. , Liu H. , Chen Z. , 2023, Comprehensive evaluation method for dam safety considering fusion weight optimization and conflicting information source, Journal of Tsinghua University (Science and Technology), Vol. 63, No. 10, pp. 1566-1575Google Search
14 
Zhu C. , Li Y. , Wang J. , Sun X. , 2023, A belief Rényi divergence for multi-source information fusion and its application in pattern recognition, Applied Intelligence, Vol. 53, No. 8, pp. 8941-8958DOI
15 
Hua Z. , Liu Q. , Chen Y. , Wang H. , 2023, An improved belief Hellinger divergence for Dempster-Shafer theory and its application in multi-source information fusion, Applied Intelligence, Vol. 53, No. 14, pp. 17965-17984DOI
16 
Song S. , Zhang Y. , Liu H. , Chen X. , 2024, Multi-source information fusion meta-learning network with convolutional block attention module for bearing fault diagnosis under limited dataset, Structural Health Monitoring, Vol. 23, No. 2, pp. 818-835DOI
17 
Xu C. , 2023, Immersive animation scene design in animation language under virtual reality, SN Applied Sciences, Vol. 5, No. 1, pp. 1-11Google Search
18 
Cen L. , Zhang Y. , Li H. , Wang J. , 2020, Augmented immersive reality for improved learning performance: a quantitative evaluation, IEEE Transactions on Learning Technologies, Vol. 13, No. 2, pp. 283-296DOI
19 
Krell F. , Schubert M. , Meyer J. , Reuter A. , 2023, Corporeal interactions in VRChat: situational intensity and body synchronization, Symbolic Interaction, Vol. 46, No. 2, pp. 159-181DOI
20 
Hong Y. , Zhang L. , Wu J. , Chen Q. , 2022, Application of convolution neural networks-based hierarchical perception technology in 3D clothing designs, IET Networks, Vol. 11, No. 4, pp. 213-225DOI
21 
Valluripally S. , Kumar R. , Singh P. , Sharma A. , 2023, Detection of security and privacy attacks disrupting user immersive experience in virtual reality learning environments, IEEE Transactions on Services Computing, Vol. 16, No. 4, pp. 2559-2574DOI
22 
Dagdevir E. , Karabayir M. , Yildirim A. , Aydin S. , 2023, Determination of effective signal processing stages for brain-computer interface on BCI competition IV dataset 2b: a review study, IETE Journal of Research, Vol. 69, No. 6, pp. 3144-3155Google Search
23 
Daas S. , Benamara M. , Khelifi A. , Djeddi R. , 2020, Multimodal biometric recognition systems using deep learning based on the finger vein and finger knuckle print fusion, IET Image Processing, Vol. 14, No. 15, pp. 3859-3868DOI
24 
Tyagi S. , Verma A. , Singh R. , Gupta P. , 2022, Multi-modal biometric system using deep learning based on face and finger vein fusion, Journal of Intelligent & Fuzzy Systems, Vol. 42, No. 2, pp. 943-955DOI
Jianfeng Huang
../../Resources/ieie/IEIESPC.2026.15.1.123/au1.png

Jianfeng Huang received his Ph.D. degree from South China University of Technology in 2016. He is a professor since 2019 of the School of Key Laboratory of Electronic Functional Materials and Devices at Huizhou University in Guangdong, China. His research interests include intelligent manufacturing, multi-source information fusion, and predictive maintenance of mechanical equipment.

Qiang Wan
../../Resources/ieie/IEIESPC.2026.15.1.123/au2.png

Qiang Wan obtained his master’s degree in computer engineering from South China University of Technology in 2013. Currently, he is working as an engineer at South China University of Technology. He holds multiple skill certifications, including Senior Network Engineer, Data Analyst Engineer, Google & MIT App Inventor Lecturer Qualification, ITSS Service Manager Qualification, Oracle Certified Professional (OCP), etc. He has been invited as a resource person to give various technical speeches on image processing, pattern recognition, and software computing. He has also served as a member of several domestic and international computer technology organizations, and has published articles in several internationally renowned peer-reviewed journals and conference proceedings. His areas of interest include research management, machine learning, and IoT device development.