Mobile QR Code QR CODE

2025

Reject Ratio

81.5%


  1. (School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224000, China)



Robotics, Sensors, SLAM, Feature fusion, Inertial measurement cell

1. Introduction

The development of intelligent robotics has greatly expanded the application areas of Wheeled Robots (WR). Localization technology is a core technology related to the autonomous mobility of WR. The current common localization method Simultaneous Localization and Mapping (SLAM) algorithm [1, 2]. SLAM algorithms are categorized into laser SLAM and vision SLAM, where activation SLAM has good accuracy but is more costly. Visual SLAM is cheaper and easier to install equipment. For visual SLAM, the commonly used camera sensors are Monocular Camera (Mono), binocular camera and Red Green Blue Depth map (RGB-D) camera. However, Mono is difficult to preserve the scale information [3, 4]. With the development of deep learning technology, scholars gradually began to combine deep learning and visual SLAM. This method relies too much on big data, and it is difficult to construct a suitable dataset for all scenes [5]. Thus, the study developed an enhanced algorithm that utilizes Multi-Feature Fusion (MFF) of point, line, and surface, as well as a multi-sensor fusion algorithm. The study aims to improve the localization accuracy of the vision SLAM algorithm so that it can adapt to complex scenes such as low texture and robot steering, enhance the autonomous mobility of WR, and improve the efficiency of the robot. The study is broken up into four sections. The first is a review of previous research on robot SLAM algorithms. The particular architecture of the multi-sensor fusion improvement algorithm and MFF improvement algorithm is covered in the second section. The enhanced algorithm's performance is verified in the third section. The study's conclusions, flaws, and future directions are covered in the fourth section. Compared with the Unified Point Line Plane (UPLP-SLAM) algorithm that integrates point, line, and surface features, the advantage of designing a multi feature fusion improvement algorithm in this paper is reflected in the use of the Manhattan coordinate system to construct corresponding measurement error terms, and reducing errors and improving accuracy through backend optimization. The advantage of the multi-sensor fusion improvement algorithm is reflected in solving the accuracy degradation and tracking loss problems of the UPLP-SLAM algorithm in more challenging scenarios such as feature sparsity and robot turning. In addition, compared with the UPLP-SLAM algorithm, the disadvantage of designing a multi feature fusion improvement algorithm in this paper is that it requires a large amount of computation and has low real-time performance.

The study presents two innovations. The first is the combination of surface, line, and point features, and the design of an improved algorithm based on multi feature fusion, which improves the accuracy and reconstruction effect of the algorithm and reduces cumulative errors. The second is the combination of several sensors, namely monocular cameras, inertial measurement units, and encoders, achieving an improvement in the positioning accuracy of wheeled robots in more challenging scenarios and solving the problem of inaccurate positioning of wheeled robots.

2. Related Works

The evolution of civilization and advances in science and technology have encouraged the rise of intelligent robots in numerous social domains, particularly wheeled mobility robots. Many recent studies are centered on SLAM algorithms with the goal of enhancing the capacity of wheeled mobile robots to navigate autonomously. Lan and other researchers designed an anchor point deployment strategy for robotic SLAM incorporating a stereo vision system for the indoor patrolling problem of service robots. With this strategy, the robot was able to synchronize autonomous localization and environment map construction. In addition, the study used RFID to shorten the path of the robot to construct the map autonomously. Finally, the study combined the constructed map with the A* algorithm to accomplish the patrol task. The results of the study revealed that the maps constructed by this strategy have good mapping effect [6]. For large and heavy object handling, experts such as Recker T designed a non-complete mobile manipulator approach that combines a single mobile manipulator and a roller plate. In addition, to localize the objects, the study used a depth camera for 3D perception and SLAM navigation. The experimental results showed that the method designed in this study was able to perform large and heavy object handling tasks better [7]. Cremona et al. assessed the accuracy and processing time of the most recent optical inertial odometry systems and SLAM systems on agricultural fields in order to solve the difficulties experienced by SLAM systems in this setting. The evaluation used sensor data from WR in a soybean field. Experimental results revealed that factors such as highly repetitive environmental appearance in arable fields can cause visual difficulties that affect the performance of visual inertial odometry systems and SLAM systems, and that state-of-the-art SLAM systems still require further optimization [8]. Chen and other researchers designed an adaptive neural network based control method in order to control the trajectory tracking of uncertain wheeled mobile robots with velocity constraints and incomplete constraints. In addition, the study employs the obstacle Lyapunov function. The method's strong performance and ability to achieve the robot's trajectory tracking control were demonstrated by the experimental results [9]. These methods have shown some effectiveness in improving the autonomous movement ability of wheeled robots, but they also suffer from insufficient accuracy, time-consuming feature extraction and matching, and easy tracking loss when facing low texture scenes. In addition, deep learning relies on big data, making it difficult to establish datasets suitable for all scenarios.

Trybala P and other experts designed a wheeled mobile robot for exploration and surveying in underground mining areas based on the advantages of mobile robots in environments that are unfriendly to humans. The study also used a survey-grade laser scanner and SLAM solution and was able to provide a 3D representation of the underground environment in the mining area. The experimental results showed that the robot designed in this study had high accuracy and was able to be utilized in other environments that are not human friendly [10]. Yang and other researchers created a kinematic model of the mobile robot and utilized an RGB-D camera and a visual tracking controller to describe the learning demonstration task of a wheeled mobile robot. Furthermore, the kinematic model of the mobile manipulator served as the foundation for the construction of the visual tracking controller. The testing results showed that the institute's learning system performs well and that the wheeled mobile manipulator picks up task requirements by observation alone, without the need for manual preparation [11]. A new ultra-wide field of view picture dataset was presented by Benseddik and colleagues for the purpose of performing performance validation on vision-based robot motion estimation methods. The dataset involved many panoramic cameras, robotic platforms, etc., and was also of a more diverse type, enabling effective validation of methods such as vision SLAM. Experimental results revealed that the dataset presented in this study is able to better validate the performance of robot motion estimation algorithms [12]. Experts such as Vasilopoulos used an online response scheme for the planar navigation problem. The method employed SLAM and reconstructed the geometric knowledge. In addition, the study also generated a vector field planner. Experimental results showed that the method designed in this study has good robustness and relatively moderate computational cost [13]. Yang F and other researchers aimed to address the issue of visual SLAM algorithms being prone to failure in low texture and artificial environment with insufficient points. They adopted line based structural features from the Manhattan World and designed a new monocular SLAM system that integrates feature points and structural lines. Based on the structural characteristics of the Manhattan world, this study proposes a new module optimization strategy. The results show that the proposed system outperforms the state-of-the-art monocular SLAM systems in terms of accuracy and robustness in artificial environments [14]. However, these methods can lead to accuracy degradation and tracking loss in more challenging scenarios such as feature sparsity and robot turning, affecting the autonomous movement ability of wheeled robots. In addition, recognizing surface features solely through color images not only fails to obtain surface scale information, but also easily leads to missed and false detections.

To summarize, the current research on wheeled mobile robots is relatively rich, and there are many studies using SLAM. However, SLAM algorithms also have certain shortcomings, such as the problem of tracking loss when facing low texture scenes, and the problem of accuracy degradation and tracking loss in more challenging scenarios such as feature sparsity and robot turning. In response to the first question, the research started from RGB-D camera and designed an improved algorithm for point, line and surface (PLS) MFF. In response to the second issue and also to make up for the shortcomings of the improved algorithm for multi feature fusion, the study also designed an improved algorithm for multi-sensor fusion, which fuses Mono, Inertial Measurement Unit (IMU) and encoder. The innovation of the study is the combination of PLS features with multiple sensors and also the design of joint initialization and sliding window optimization process for multiple sensors.

3. Design of Improved ORB-SLAM2 and VINS-Mono Algorithms for Wheeled Robots

In this section, for the shortcomings of SLAM algorithm in low texture scenes, the study designs corresponding solution algorithms from the perspective of PLS MFF. Aiming at the problem of tracking loss of MFF algorithm in complex scenes such as robot steering, the study designs the corresponding solution algorithm from the perspective of multi-sensor fusion.

3.1. Improved Design of ORB-SLAM2 Algorithm Based on Fusion of PLS Features

Traditional SLAM systems suffer from accuracy slippage, tracking loss in low texture, and robot steering issues when it comes to improving WR autonomous mobility. In order to solve the problem of accuracy slippage in low texture scenarios, the study starts from RGB-D camera, of which ORB-SLAM2 is selected for improvement [15]. The specific improvement measures are to perform PLS MFF. In order to address the issue of robot tracking loss in steering scenarios, the research develops an enhanced algorithm that utilizes multi-sensor fusion. Fig. 1 depicts the main flow of the upgraded ORB-SLAM2 algorithm.

Fig. 1. The main process of improving the ORB-SLAM2 algorithm.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig1.png

From Fig. 1, it can be seen that the updated ORB-SLAM2 algorithm mainly consists of three parts, namely image input, visual odometry, and backend optimization. The image input includes color and depth maps, while visual odometry covers the extraction and processing of point, line, and surface features, pose estimation, and keyframe selection. Meanwhile, backend optimization includes content such as Manhattan Frame (MF) extraction and processing, and filtering of keyframes, etc. In improving the ORB-SLAM2 algorithm, the role of the visual odometry module is to determine whether to insert the current frame as a keyframe into the backend. In this process, uninitialized features need to first construct an initial map, while initialized features need to estimate pose through nonlinear optimization. Subsequently, determine whether to insert keyframes based on tracking quality and backend optimization. Backend optimization is responsible for processing keyframes, reconstructing new map points and lines through triangulation, and estimating state variables such as keyframe pose, point, line, and face positions in local maps using nonlinear optimization methods. At the same time, research is being conducted on using the alignment characteristics of MF to correct potential errors in triangulation reconstruction and improve the accuracy of triangulation reconstruction. By improving the performance of ORB-SLAM2 algorithm in RGB-D cameras, the tracking accuracy of robots can be enhanced.

Image features generally contain two parts: key regions and descriptors, where key regions involve key points, lines and faces [16]. Agglomerative Hierarchical Clustering (AHC) facet features are improved by adjusting the mean filter window size according to the depth value and performing adaptive mean filtering on the depth map. In visual odometry, the extraction of point features involves treating the pixel coordinates (PC) as an observation which is shown in Eq. (1).

(1)
$ p = [U, V]^T. $

In Eq. (1), $U$ and $U$ represent the horizontal and vertical coordinates of the pixel, respectively. Eq. (2) displays the line characteristics' observed values.

(2)
$ l = \frac{\Pi(p_s) \times \Pi(p_e)}{\|\Pi(p_s)\| \cdot \|\Pi(p_e)\|}. $

In Eq. (2), $\Pi(\cdot)$ stands for taking the chi-square coordinates, $\|\cdot\|$ is the mode length of the orientation, and $p_s$ and $p_e$ are the PC. The filtered depth map is shown in Eq. (3).

(3)
$ I(x, y) = \left( \frac{S(x+r, y+r) - S(x-r, y+r) - S(x+r, y-r) + S(x-r, y-r)}{4r^2} \right). $

In Eq. (3), $I(x, y)$ and $S(x, y)$ are the pixel value of coordinate $(x, y)$ in the filtered depth map and integral image, respectively. $r$ is the product of the coordinate pixel value and the camera constant in the depth map, and $S$ is the integral image. The improved AHC surface features are shown in Eq. (4).

(4)
$ \Upsilon' = \left[ \arctan\left(\frac{N_y}{N_x}\right), \arcsin(N_z), d_\Upsilon \right]^T. $

In Eq. (4), $d_\Upsilon$ represents the distance between the surface and the origin of the camera coordinate system of the current frame, and $N$ is the normal vector, which is expressed as $N = [N_x, N_y, N_z]^T$. Eq. (5) illustrates how the location of the Bth map point in the world coordinate system (WCS) is determined during initialization.

(5)
$ p_h^w = d_h \Pi^{-1}(p_h). $

In Eq. (5), $d_h$ and $p_h$ represent the depth of the $h$th feature point and PC, respectively, and $\Pi^{-1}$ is the inverse projection function. For pose estimation, the study used nonlinear optimization methods to estimate the current frame pose. The commonly used nonlinear optimization methods include Levenberg-Marquardt (LM) method and Gaussian Newton method [17]. Fig. 2 illustrates the process of applying LM in SLAM to solve the least squares problem.

Fig. 2. The process of using LM to solve the least squares problem in SLAM.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig2.png

Fig. 3. Points and lines triangulation.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig3.png

From Fig. 2, it can be seen that the key to using LM to solve the least squares issue is to solve the Jacobian matrix and increment, and update the multidimensional state variables. In the back-end optimization, MFs are extracted mainly in the 3D lines and surfaces of the camera coordinate system of the current frame. In updating the local maps, the study adopts the triangulation method of 2D points and the triangulation method of 2D lines. The triangulation of points and lines is shown schematically in Fig. 3.

In Fig. 3, the 2D endpoints of the line in the $n$ frame are represented by $p_e^{c_n}$ and $p_s^{c_n}$, whereas $O_n$ and $O_k$ stand in for the optical centers of various cameras. And $p_e^{c_k}$ and $p_s^{c_k}$ are the 2D endpoints of the line in the current frame. $p_e^{c_k}$ and $p_s^{c_k}$ are the 2D endpoints of the line in the current frame, and $\pi'$ is the plane. $p^{c_k}$ and $p^{c_n}$ are the 2D points of the current frame and the matching 2D points of the $n$ frame, respectively. The local map optimization of points, lines, planes and MFs also adopts the LM method. The screening of keyframes is to remove redundant keyframes, and the judgment of redundant keyframes is based on the fact that 90% of the PC of the frame will be observed by other keyframes.

3.2. Design of Improved VINS-Mono Algorithm Based on Multi-sensor Fusion

In the previous section, the ORB-SLAM2 algorithm based on the improved fusion of multiple features of PC was studied and designed for the SLAM algorithm that suffers from the problem of accuracy degradation in low texture scenes. However, the algorithm suffers from problems such as tracking loss in complex scenes such as feature sparsity and robot steering. To solve this problem, the study designed an improved algorithm for multi-sensor fusion. The algorithm fuses Mono, IMU and encoder, and Mono and IMU form Monocular Visual Inertial Navigation System (Mono-VINS), so the short name of this improved algorithm is Improved Mono-VINS Algorithm. The flow of the algorithm is shown in Fig. 4.

Fig. 4. Process of improved Mono-VINS algorithm based on multi-sensor fusion.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig4.png

From Fig. 4, it can be seen that the modified Mono-VINS algorithm mainly includes sensor information acquisition, sensor preprocessing, vision-IMU-encoder joint initialization, vision-IMU-encoder sliding window optimization, and vision-IMU-encoder closed-loop optimization. Among them, sensor preprocessing includes point feature extraction and processing, IMU pre integration, and so on. Vision-IMU-encoder closed-loop optimization mainly includes closed-loop detection, closed-loop optimization, and keyframe database. In the improved Mono-VINS algorithm, sensor preprocessing is responsible for processing the raw data of monocular cameras, IMUs, and encoders. The joint initialization module of vision-IMU-encoder needs to calculate the initial sliding window state variable value for sliding window optimization. Vision-IMU-encoder sliding window optimization requires inserting keyframes into closed-loop optimization, and this process requires estimating the state variables of each frame image in the sliding window at different times. Vision-IMU-encoder closed-loop optimization requires matching the current keyframe with the keyframe database. Eq. (6) illustrates the measurement model under discrete time in IMU preprocessing [18].

(6)
$ \begin{cases} \partial'_{m+q/n} = \partial_{m+q/n} + R_w^{b_{m+q/n}} g^w + b_{\partial_m} + Z_\partial, \\ \omega'_{m+q/n} = \omega_{m+q/n} + b_{\omega_m} + Z_\omega. \end{cases} $

In Eq. (6), $\partial'_{m+p/n}$ is the acceleration of the airframe, $\omega'_{m+p/n}$ is the angular velocity of the airframe, $m$ and $q$ represent frame $m$ and frame $q$, respectively, and $g^w$ represents the gravity vector in the WCS. $R_w^{b_{m+q/n}}$ is the rotation matrix of the body coordinate system from the WCS to the moment of IMU data in frame $q$. Additionally, $\partial_{m+q/n}$ and $\omega_{m+q/n}$ stand for the actual values of angular velocity and acceleration, respectively. The Gaussian measurement noise for acceleration and angular velocity is represented by $Z_\partial$ and $Z_\omega$, respectively, while the acceleration and angular velocity biases between the $m$ and $m+1$ frame pictures are represented by $b_{\partial_m}$ and $b_{\omega_m}$. The IMU observations are obtained by integrating the IMU measurements. Position estimation is performed by calculating the body position at the $m+1$ frame. The airframe velocity is calculated as shown in Eq. (7) [19].

(7)
$ \begin{cases} v_{e_m} = \frac{v_{right} + v_{left}}{2}, \\ \omega_{e_m} = \frac{v_{right} - v_{left}}{D}. \end{cases} $

In Eq. (7), $v_{e_m}$ is the linear velocity and $\omega_{e_m}$ represents the angular velocity. $v_{right}$ and $v_{left}$ are the speed of the left wheel and the right wheel respectively, and $D$ is the distance between the left and right wheels. The encoder speed $v_{e_m}^{e_m}$ at the moment of the $m$ frame is shown in Eq. (8).

(8)
$ v_{e_m}^{e_m} = \left[ v_{e_m} \quad 0 \quad 0 \right]^T $

The IMU velocity observation $v_{b_m}^{b_m}$ is shown in Eq. (9).

(9)
$ v_{b_m}^{b_m} = R_e^b \left( v_{e_m}^{e_m} + \omega_{e_m} \left[ (p_e^b)_\alpha - (p_e^b)_\beta \quad 0 \right]^T \right). $

In Eq. (9), $R_e^b$ represents the rotation of the encoder with respect to the IMU, and $(p_e^b)_\alpha$ and $(p_e^b)_\alpha$ denote the $\alpha$ and $\beta$ component values of the encoder displacement $(p_e^b)_\alpha$ with respect to the IMU, respectively. The slip factor is calculated as shown in Eq. (10) [20].

(10)
$ \phi_m = \begin{cases} \exp\left[ -\frac{(\omega_{e_m} - \omega_{b_m})^2}{\delta} \right], & |\omega_{e_m} - \omega_{b_m}| \le \varepsilon, \\ 0, & |\omega_{e_m} - \omega_{b_m}| > \varepsilon. \end{cases} $

In Eq. (10), $\omega_{b_m}$ represents the instantaneous angular velocity of the airframe, $\delta$ represents the adjustment parameter, and $\delta$ is the threshold value. The smaller the value of the slip factor, the more serious the data distortion. The main steps of the joint vision-IMU-encoder initialization are shown in Fig. 5.

Fig. 5. The main steps of visual IMU encoder joint initialization.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig5.png

Fig. 6. The main process of visual IMU encoder sliding window optimization.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig6.png

In Fig. 5, the first step of the joint vision-IMU-encoder initialization is the computation of the camera position, the 2D point features, the 3D positions of the points and the map point positions. The second step is to compute the optimal solution of IMU angular velocity bias for all image frames. The third stage is to ascertain whether or not there has been a significant change in the optimal solution. If so, re-integrate; if not, update the IMU observations. Step four involves computing the airframe velocity, scaling factor, and gravity vector. The next task is to determine whether the acceleration of gravity and the gravity vector mode length are close to one another. If the judgment is yes, the gravity vector mode length is set to the acceleration of gravity, otherwise the process is ended. The sixth step is to define the WCS origin to coincide with the reference coordinate system (RCS) origin to complete the initialization and end the process. The IMU angular velocity bias $b_\omega$ optimal solution is calculated as shown in Eq. (11).

(11)
$ b'_\omega = \text{argmin}_{b_\omega} \sum_{m=0}^{M-1} \left\| \left( q_{b_{m+1}}^{c_0} \right)^{-1} \otimes q_{b_m}^{c_0} \otimes \gamma_{b_{m+1}}^{b_m} \right\|^2. $

In Eq. (11), $q_{b_m}^{c_0}$ represents the attitude of the body in the RCS, $\otimes$ is the quaternion multiplication, and $M$ is the total images used for initialization. $\gamma_{b_{m+1}}^{b_m}$ is the relative rotation between the two images, and $q_{b_{m+1}}^{c_0}$ is the attitude of the body in the RCS at the moment of the $m+1$ image. $\|\cdot\|^2$ is the square of the Euclidean distance. The airframe velocity, gravity vector and scale factor are calculated as shown in Eq. (12).

(12)
$ \chi_{init} = \left[ v_{b_0}^{b_0}, v_{b_1}^{b_1}, ..., v_{b_m}^{b_m}, g^{c_0}, S \right]. $

In Eq. (12), $v_{b_m}^{b_m}$ represents the velocity of the body in the body coordinate system at the moment of the $m$th image frame, and $g^{c_0}$ and $S$ are the scale and gravity factor, respectively. Fig. 6 depicts the primary workflow of the vision-IMU-encoder sliding window optimization.

The definition of the state variables and the definition of the objective function are the first and second steps, respectively, in the vision-IMU-encoder sliding window optimization shown in Fig. 6. Obtaining the IMU measurement error term and the visual measurement error term is the third step. And the fourth step is to determine whether the encoder measurement error term is needed. The fifth step is to obtain the visual, IMU and encoder measurement error terms elegantly comparable matrix, and the sixth step is to obtain the state variables. The seventh step is to determine whether to use the current frame as the key frame, if the judgment is yes, then go to the subsequent closed-loop optimization, otherwise go back to the sensor initialization. Eq. (13) displays the body state expression.

(13)
$ X_m = \left[ q_{b_m}^w, p_{b_m}^w, v_{b_m}^w, b_{\partial_m}, b_{\omega_m} \right], m \in [0, M]. $

In Eq. (13), $(q_{b_m}^w, p_{b_m}^w)$ is the initial value of the body position and $v_{b_m}^w$ is the initial value of the body velocity in the WCS. In Eq. (14), the encoder measurement error term is displayed.

(14)
$ e_{e,m} = v_{b_m}^{b_m} - R_w^{b_m} v_{b_m}^w. $

In Eq. (14), $R_w^{b_m}$ represents the rotation matrix of the airframe coordinate system from the WCS to the moment of IMU data in the $m$th frame. The study optimizes the main frame heading angle and airframe location in the closed-loop optimization. And the state variables at the time of optimization are defined as shown in Eq. (15).

(15)
$ \chi_{loop} = \left[ p_{b_0}^w, p_{b_1}^w, ..., p_{b_m}^w, \psi_{b_0}^w, \psi_{b_1}^w, ..., \psi_{b_m}^w \right]. $

In Eq. (15), $p_{b_m}^w$ represents the airframe position in the WCS in frame $m$. $\psi_{b_m}^w$ is the airframe heading angle in the WCS of frame $m$.

4. Performance Validation of Algorithms for Wheeled Robots

This component of the study verifies the enhanced ORB-SLAM2 and enhanced VINS-Mono algorithms' performance. In addition, the study constructs an indoor platform for testing two-wheeled mobile robots, chooses and gathers datasets for evaluating algorithm performance, and lastly verifies algorithm performance in terms of average time and position Root Mean Square Error (RMSE).

4.1. Performance Validation of Improved ORB-SLAM2 Algorithm

To validate the performance of the algorithms designed in the study, an indoor experimental platform for a two-wheeled mobile robot was built. The experimental platform uses a two-wheeled differential structure and also contains devices such as visual inertial camera, encoders, Raspberry Pi and servo drives. The wheels of the robot have a diameter of 0.21 m, and the weight and load are 20 kg and 10 kg, respectively. The Raspberry Pi model is 4B, which is equipped with a 1.5GHz 64 bit quad core ARM Cortex-A72 processor. The visual inertial camera is Intel RealSense D455, with a depth range of [0.4m, 10m]. The encoder is the HEDL 9140 model from Maxon brand, and the servo drive model is RoboModule RMDS-301. Meanwhile, assuming that in most cases, the ambient lighting conditions are relatively stable to avoid strong lighting changes that may affect visual sensors. These configurations may limit the real-time performance and processing capabilities of the algorithm. The operating system used for the experiment is Windows 11, and the central processor is an Intel Core i5 12600K with a maximum RWF of 4.9 GHz and a maximum memory of 128 GB. The TUM dataset is used in the study to validate the enhanced ORB-SLAM2 algorithm's performance [21]. This dataset, as an RGB-D dataset, contains color and depth maps with a resolution of 640?480 on different motion trajectories, as well as real trajectories calculated by high-precision camera systems. The pixels in color images and depth images have already corresponded to 1:1. In addition, the dataset mainly involves indoor scenes and also uses Microsoft Kinect sensors, consisting of 39 sequences. The Kinect sensor comes with factory calibration based on high-order polynomial warping functions, and its focal length is (525, 525) with an optical center of (319.5, 239.5). For comparison, the pre-improved ORB-SLAM2 algorithm, Manhattan SLAM, and Collaborative Monocular SLAM are chosen [22]. By comparing with the pre-improved ORB-SLAM2 algorithm, the performance changes of the improved ORB-SLAM2 algorithm can be more intuitively felt. Manhattan SLAM is an algorithm that combines point, line, and surface features based on the assumption of a mixed Manhattan world, and the improved multi feature fusion algorithm designed in this paper uses three types of features: point, line, and surface. Therefore, by comparing with the Manhattan SLAM algorithm, the performance of the algorithm designed in the paper can be more clearly verified. Collaborative monocular SLAM, as one of the SLAM algorithms, can be compared to understand the gap between the algorithm designed in the paper and the current algorithms in the same field. In parameter selection, the study referred to the settings in the references to control the accuracy of the proposed algorithm [23, 24]. In order to obtain the optimal performance of the algorithm, the proposed algorithm features will be tested through comparative tests with different numerical values, while also referring to the settings of other similar literature [25]. The RMSE and MF comparisons of the trajectory positions of the different algorithms on different data packets are shown in Fig. 7.

Fig. 7. Comparison of RMSE and MF of trajectory position on different data packets by different algorithms.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig7.png

Fig. 8. The effect comparison of different algorithms on wheeled robots.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig8.png

In Fig. 7(a), the maximum values of RMSE for the ORB-SLAM2 algorithm, the improved ORB-SLAM2 algorithm, the Manhattan SLAM algorithm, and the collaborative monocular SLAM algorithm are 0.039 m, 0.058 m, 0.086 m, and 0.078 m, respectively, and the minimum values are 0.018 m, respectively, for different data packets, 0.015 m, 0.018 m, and 0.017 m. While none of the other three methods have tracking losses, the ORB-SLAM2 algorithm has three. According to Fig. 7(b), the number of MFs for the collaborative monocular SLAM algorithm, enhanced ORB-SLAM2 technique, and Manhattan SLAM algorithm have lowest values of 0 for all three, and maximum values of 696, 797, and 758, respectively, for various data packets. The enhanced ORB-SLAM2 algorithm performs better and has higher localization accuracy. In order to validate the effect of the improved algorithms on WR, the study collected regular indoor datasets containing packets of three motion trajectories. The comparison of the effect of different algorithms on WR is shown in Fig. 8.

The RMSE of the Manhattan SLAM algorithm, the modified ORB-SLAM2 algorithm, and the ORB-SLAM2 algorithm are shown in Fig. 8(a). And the collaborative monocular SLAM algorithm on the indoor_general_quad packet are 0.168 m, 0.152 m, 0.187 m, and 0.175 m, respectively. The four algorithms RMSE on the indoor_general_splay packet are 0.184 m, 0.168 m, 0.196 m and 0.187 m, and on the indoor_general_twist packet are 0.196 m, 0.176 m, 0.202 m and 0.195 m, respectively. From Fig. 8(b), the number of MFs for the Manhattan SLAM algorithm, improved ORB-SLAM2 algorithm and collaborative monocular SLAM algorithm on the indoor_general_quad packet are 74, 0 and 1, respectively. The number of MFs for the three algorithms on the indoor_general_spray packet are 223, 4 and 5, respectively, on the MF numbers on the indoor_general_twist packet are 219, 8 and 9, respectively. It is evident that when used with WR, the enhanced ORB-SLAM2 algorithm performs better. Table 1 displays the average processing time comparison of several algorithms for a single frame under the TUM dataset and ordinary indoor dataset.

Table 1. The average time of processing an image by different algorithms under TUM data set and conventional indoor data set is compared.

Data set

ORB-SLAM2

Improved ORB-SLAM2

Manhattan SLAM

Collaborative monocular SLAM

fr3_s_t_far

42.8 ms

75.1 ms

95.8 ms

94.7 ms

fr3_s_t_near

39.8 ms

68.9 ms

78.6 ms

77.5 ms

fr3_s_nt_far

*

44.2 ms

44.9 ms

45.8 ms

fr3_s_nt_near

*

42.2 ms

42.9 ms

42.7 ms

fr3_large_cabinet

*

62.1 ms

72.3 ms

70.8 ms

frl_xyz

38.9 ms

69.8 ms

98.4 ms

95.4 ms

frl_desk

44.8 ms

65.2 ms

73.4 ms

71.9 ms

fr2_xyz

38.5 ms

63.7 ms

88.5 ms

84.8 ms

fr2_desk

44.8 ms

77.9 ms

89.5 ms

87.3 ms

indoor_general_quad

38.6 ms

61.8 ms

75.2 ms

74.5 ms

indoor_general_splay

38.9 ms

62.7 ms

75.6 ms

74.8 ms

indoor_general_twist

40.6 ms

69.2 ms

77.6 ms

76.5 ms

In Table 1, * represents tracking loss. On the TUM dataset, the maximum values of the average time for the ORB-SLAM2, modified ORB-SLAM2, Manhattan SLAM, and collaborative monocular SLAM algorithms are 44.8 ms, 77.9 ms, 98.4 ms, and 95.4 ms, respectively, and the minimum values are 38.5 ms, 42.2 ms, 42.9 ms, and 42.7 ms, respectively. Under the regular indoor dataset, the maximum values of the average time of the four algorithms are 40.6 ms, 69.2 ms, 77.6 ms, and 76.5 ms, and the minimum values are 38.6 ms, 61.8 ms, 75.2 ms, and 74.5 ms, respectively. It is evident that the enhanced ORB-SLAM2 algorithm performs better and works better on WR.

Compared to other methods, the unique feature of the improved ORB-SLAM2 algorithm lies in its combination of point, line, and surface features, and this multi feature fusion approach can more comprehensively describe the geometric structure of the environment. At the same time, it uses MF to reduce accumulated errors, improve positioning accuracy and reconstruction effectiveness. In addition, the advantage of the improved ORB-SLAM2 algorithm is that it can successfully estimate the trajectory of ORB-SLAM2 tracking failure in low texture scenes, with stronger robustness.

4.2. Performance Validation of Improved VINS-Mono Algorithm

Through the experimental comparison in Subsection 4.1, it was found that the improved ORB-SLAM2 algorithm has a maximum and minimum RMSE of 0.058 m and 0.015 m, respectively, under different data packets, and there is no tracking loss, resulting in better performance. The study gathered regular indoor datasets and feature sparse indoor datasets to validate the performance of the modified VINS-Mono algorithm. At this point, there are environmental constraints, which may lead to algorithm tracking failure or decreased accuracy in environments with sparse features, especially when the robot is turning or moving at high speeds. Meanwhile, it is assumed that the improved VINS Mono algorithm can maintain a certain level of robustness in challenging scenarios such as feature sparsity and robot steering. The settings such as robot and operating system used for the experiments are consistent with those in the previous section and are not repeated here. The study categorizes the improved VINS-Mono algorithm into two types with and without slip detection. For the slipping experiments, the study set up four slipping trajectories, and the sensor packet used was indoor_sparse_slip. A comparison of the algorithm's trajectories and displacements on the indoor_sparse_slip packet is shown in Fig. 9.

Fig. 9. Comparison of trajectories and displacements of algorithms on indoor_sparse_slip data packet.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig9.png

In Fig. 9(a), the trajectories corresponding to the improved VINS-Mono algorithm with slip detection and the real trajectories are very close. When the distance in the $x$-axis direction is 7 m to 9 m, there is a more obvious gap between the trajectory corresponding to the improved VINS-Mono algorithm without slip detection and the true trajectory, with a difference of about 0.2 m. In Fig. 9(b) that in the displacement in the $x$-axis direction, the displacement corresponding to the improved VINS-Mono algorithm with slip detection is very close to the true displacement, and the displacement without slip detection is closer than the true displacement. detection the displacement is larger than the true displacement. From Fig. 9(c), in the $y$-direction displacement, on the whole, the displacement corresponding to the improved VINS-Mono algorithm with slip detection is also very close to the true displacement. When the time is from 71 s to 78 s, the difference between the displacement without slip detection and the true displacement is the largest, which is about 0.18m. It can be seen that the improved VINS-Mono algorithm with slip detection performs better. The positional RMSE and time comparisons of the different algorithms on different datasets are shown in Fig. 10.

Fig. 10. Comparison of RMSE and time for different algorithms on different datasets.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig10.png

Based on the indoor_general_quad packet, the RMSE of the enhanced ORB-SLAM2 algorithm is 0.152m, as shown in Fig. 10(a). Similar to this, the improved VINS-Mono algorithm's RMSE is 0.149 m for closed-loop detection and 0.175 m for it. The three algorithms take 61.8 ms, 97.8 ms, and 73.2 ms to execute. On the indoor_general_splay packet, the RMSEs of the three algorithms are 0.168 m, 0.162 m and 0.184 m, and the runtimes are 62.7 ms, 89.3 ms and 69.9 ms, respectively. On the indoor_general_twist packet, the RMSEs of the three algorithms have RMSEs of 0.176 m, 0.185 m, and 0.189 m with running times of 69.2 ms, 99.2 ms, and 76.8 ms, respectively. Based on Fig. 10(b), the indoor_sparse_quad packet's RMSE for the pre-improved VINS-Mono algorithm is 2.549 m, while the RMSE for the enhanced VINS-Mono algorithm is 0.163 m and 0.188 m, respectively, with and without closed-loop detection. The RMSE of the three algorithms is 0.152 m, 0.176 m, 4.298 m, and 0.241 m under the indoor_sparse_splay and indoor_sparse_twist datasets, the RMSE of the three algorithms are 0.152 m, 0.176 m, 4.298 m and 0.198 m, 0.241 m, 3.198 m, respectively. Under the three-feature sparse indoor datasets, the optimization time of the sliding window of the improved VINS-Mono algorithm is 36.5 ms, 38.4 ms The improved VINS-Mono algorithm with closed-loop detection performs better. The displacement and heading angle comparisons under different datasets are shown in Fig. 11.

Fig. 11. Comparison of displacement and heading angle under different datasets.

../../Resources/ieie/IEIESPC.2026.15.2.293/fig11.png

For the indoor_general_quad packet, the enhanced VINS-Mono algorithm with closed-loop detection shows a tiny displacement and true displacement gap in the $x$- and $y$-axis directions (Figs. 11(a)-11(c)). Similarly, the heading angle gap is modest. When the time is from 70 s to 100 s, the improved VINS-Mono algorithm without closed-loop detection has a larger gap between the displacement in the $y$-axis direction and the true displacement, which is about 0.3 m. Figs. 11(d)-11(f) show that under the indoor_sparse_quad packet, the improved VINS-Mono algorithm with closed-loop detection has a larger gap between the $x$-axis and $y$-axis directions with a smaller gap between the displacements and the true displacements, and the gap in heading angle is also smaller. In both the $x$- and $y$-axis directions, the improved VINS-Mono algorithm's displacement and true displacement differ more. Better performance is achieved by the upgraded VINS-Mono algorithm with closed-loop detection.

In order to better validate the performance of the improved VINS-Mono algorithm designed in the paper, other robot platforms were used for experiments. The overall size of the platform is $350 \times 300$ mm, driven by differential drive, with a battery capacity of 30000 mA, a load of 25 kg, and a maximum speed of 0.8 m/s. In addition, the platform also includes attitude sensors and Lightlaser Detection And Ranging (LiDAR) sensors. Among them, the angular velocity range of the attitude sensor is $\pm 10/20/50/250^\circ$/s, the angular range is $\pm 180^\circ$, the scanning frequency of the LiDAR sensor is 15 Hz, and the measurement blind spot is 0.2 m. The research named the experimental platform Platform 2. The comparison of computational efficiency of different algorithms on Platform 2 is shown in Table 2.

Table 2. Comparison of computational efficiency of different algorithms on Platform 2.

Data packet

Algorithm

Improved ORB-SLAM2

VINS-Mono before improvement

Improved VINS-Mono without closed-loop detection

Improved VINS-Mono with closed-loop detection

indoor_general_quad

61.8 ms

69.4 ms

73.2 ms

97.8 ms

indoor_general_splay

62.7 ms

67.8 ms

70.9 ms

89.3 ms

indoor_general_twist

69.2 ms

65.9 ms

76.8 ms

99.2 ms

indoor_sparse_quad

-

34.3 ms

35.2 ms

36.5 ms

indoor_sparse_splay

*

35.9 ms

36.8 ms

38.4 ms

indoor_sparse_twist

-

33.6 ms

33.4 ms

33.0 ms

Note: "*" in Table 2 indicates severe trajectory fluctuations estimated by the algorithm, and "-" indicates tracking failure.

Under the conventional indoor datasets indoor_general_quad, indoor_general_splay, and indoor_general_twit, the minimum computational efficiency of the improved ORB-SLAM2 algorithm designed in the paper is 61. 8ms, which is significantly lower than other algorithms and has better real-time performance. In addition, the average computational efficiency of the improved VINS-Mono algorithm with closed-loop detection is 95.4 ms, which is significantly different from other algorithms. This is because closed-loop optimization requires a large amount of computation and consumes a significant amount of computing resources. The average computational efficiency of the improved VINS-Mono algorithm with closed-loop detection is 36.0 ms on the sparse indoor datasets indoor_sparse_quad, indoor_sparse_splay, and indoor_sparse_twist packets, with a small difference from the comparison algorithm and good real-time performance. In order to verify the robustness of the design method in the paper, experiments were conducted on robots placed in outdoor scenes. There are three types of outdoor scenes, namely the spacious teaching building, moving around the library, and walking from a step about 1.5 m high into a garage about 3 m high. The optical axis of the robot D435i camera was set to an elevation angle of about $32^\circ$ relative to the ground, and data packets were re collected for three different scenarios, namely output_spacious, output_cuboid, and output_3D. The comparison of RMSE and computational efficiency of different algorithms in different scene data packets is shown in Table 3.

Table 3. Comparison of RMSE and computational efficiency of different algorithms in different scene data packets.

Algorithm

Data packet

RMSE

Computational efficiency

outdoor_spacious

outdoor_cuboid

outdoor_3D

outdoor_spacious

outdoor_cuboid

outdoor_3D

Improved ORB-SLAM2

2.65

1.33

1.46

61.6 ms

64.4 ms

65.6 ms

Improved VINS-Mono without closed-loop detection

0.305

0.337

0.323

63.7 ms

74.5 ms

75.5 ms

Improved VINS-Mono with closed-loop detection

0.268

0.283

0.279

88.0 ms

98.0 ms

94.4 ms

From Table 3, it can be seen that in different scenarios, the computational efficiency and average RMSE of the improved VINS-Mono algorithm with closed-loop detection designed in the paper are 93.47 ms and 0.28, respectively. The average computational efficiency of the improved ORB-SLAM2 algorithm and the improved VINS-Mono algorithm without closed-loop detection are 63.87 ms and 71.23 ms, respectively, and the average RMSE is 1.81 and 0.32. It can be seen that the computational efficiency of this algorithm is significantly higher than that of the comparison algorithm in three scenarios, but the RMSE is lower than that of the comparison algorithm. This indicates that the improved VINS-Mono algorithm with closed-loop detection has higher accuracy and stronger robustness.

Compared to other methods, the unique feature of improving the VINS Mono algorithm is the integration of a monocular camera, IMU, and encoder. This fusion method not only utilizes the visual information of monocular cameras, but also utilizes the inertial measurement data of IMU and the velocity information of encoders, enhancing the observability of scale and solving the problem of inaccurate positioning of VINS Mono in wheeled robot applications. At the same time, a slip factor was introduced, effectively reducing the impact of slip on the positioning results. The advantage of this algorithm is that it significantly improves the positioning accuracy and maintains high robustness in sparse feature scenes.

5. Conclusion

To enhance the autonomous mobility of WR, the study designed an improved ORB-SLAM2 algorithm based on PLS MFF, and also an improved VINS-Mono algorithm based on multi-sensor fusion. The results revealed that the maximum values of RMSE for the pre-improved ORB-SLAM2 algorithm, the improved ORB-SLAM2 algorithm, the Manhattan SLAM algorithm, and the collaborative monocular SLAM algorithm were 0.039 m, 0.058 m, 0.086 m, and 0.073 m, respectively, for different packets. The pre-improved ORB-SLAM2 algorithm experienced three tracking loss, while the improved algorithm did not have tracking loss. The improved ORB-SLAM2 algorithm achieved better localization accuracy and better performance. On the indoor_sparse_quad packet, the RMSE of the pre-improvement VINS-Mono algorithm, the improved VINS-Mono algorithm with closed-loop detection and the improved VINS-Mono algorithm without closed-loop detection were 2.549 m, 0.163 m, and 0.188 m, respectively. On the three-feature sparse indoor datasets, the sliding-window optimization time of the improved VINS-Mono algorithm were 36.5ms, 38.4 ms and 32.8 ms, respectively. The improved VINS-Mono algorithm performed better, and the improved VINS-Mono algorithm with closed-loop detection performed better.

There are four shortcomings in the research. Firstly, to improve the robot's localization accuracy, the study introduced multiple features and multiple sensors, but this also increased the computation time of the algorithm and reduced its real-time performance. In order to improve the real-time performance of the algorithm, future research could conduct in-depth studies on reducing the algorithm's elapsed time. For example, optimizing feature extraction algorithms to reduce feature extraction time can also be attempted by using incremental optimization methods to reduce backend optimization computation and optimization time. Secondly, the multi feature improvement algorithm and multi-sensor algorithm designed in the paper exist separately and do not combine the advantages of the two methods, fully leveraging the advantages of these two algorithms. Future research can introduce multiple features and multiple sensors in the same algorithm, such as introducing IMUs and encoders to improve the robustness of RGB-D SLAM in challenging scenarios such as low texture, robot turning, and open spaces, establishing 3D maps containing points, lines, and surfaces, and improving reconstruction results. Thirdly, the multi feature improvement algorithm designed in the paper is suitable for conventional indoor environments, and it faces the problem of low accuracy in indoor environments with sparse features or outdoor environments in datasets. Future research can combine different sensors to address this issue. The improved VINS-Mono algorithm designed in the paper adopts multiple sensors, which can provide guidance for subsequent research on the application of multiple sensors and has a certain influence and relevance. Fourthly, the depth and breadth of multi-sensor fusion are limited, and there is insufficient research on the fusion of other sensors that may be helpful for positioning accuracy, which limits the robustness and accuracy improvement of the algorithm in complex environments. Future research can increase the variety of sensors, such as introducing other sensors such as Light Detection and Ranging and ultrasonic sensors, or optimize fusion methods, such as automatically learning the relationships between sensor data through neural networks to improve fusion accuracy.

With the development of machine learning, edge computing and advanced sensors, future research can use these technologies to improve the capability and performance of SLAM algorithm. Firstly, future research can use machine learning techniques for attitude estimation, scene understanding, map construction, and localization, and improve traditional SLAM algorithms through training data, thereby enhancing the accuracy and stability of attitude estimation. Secondly, future research can unload part of the visual SLAM system through edge computing, allowing the visual SLAM system to run for a long time with limited resources, without affecting the accuracy of operations, and keeping the computing and memory costs on mobile devices unchanged. Finally, future research can integrate sensor technology with SLAM to achieve complementary capabilities of radar and vision, and achieve more robust SLAM.

References

1 
Shafaei S. M. , Mousazadeh H. , 2023, Characterization of motion power loss of off-road wheeled robot in a slippery terrain, Journal of Field Robotics, Vol. 40, No. 1, pp. 57-72DOI
2 
Hu B. , Cao Z. , Zhou M. , 2020, An efficient RRT-based framework for planning short and smooth wheeled robot motion under kinodynamic constraints, IEEE Transactions on Industrial Electronics, Vol. 68, No. 4, pp. 3292-3302DOI
3 
Ortiz S. , Yu W. , 2021, Autonomous navigation in unknown environment using sliding mode SLAM and genetic algorithm, Intelligence & Robotics, Vol. 1, No. 2, pp. 131-150DOI
4 
Hu J. , Shi X. , Ma C. , Yao X. , Wang Y. , 2023, (MLVI)-L-3: A multi-feature, multi-metric, multi-loop, LiDAR-visual-inertial odometry via smoothing and mapping, Industrial Robot, Vol. 50, No. 3, pp. 483-495DOI
5 
Groumpos P. P. , 2023, A critical historic overview of artificial intelligence: Issues, challenges, opportunities, and threats, Artificial Intelligence and Applications, Vol. 1, No. 4, pp. 197-213DOI
6 
Lan C. W. , Liu C. H. , 2020, The research of the RFID anchor assisted SLAM on an autonomous patrolling system, Journal of Marine Science and Technology, Vol. 28, No. 5, pp. 394-403DOI
7 
Recker T. , Heilemann F. , Raatz A. , 2021, Handling of large and heavy objects using a single mobile manipulator in combination with a roller board, Procedia CIRP, Vol. 97, pp. 21-26DOI
8 
Cremona J. , Comelli R. , Pire T. , 2022, Experimental evaluation of visual-inertial odometry systems for arable farming, Journal of Field Robotics, Vol. 39, No. 7, pp. 1121-1135DOI
9 
Chen Z. , Liu Y. , He W. , Qiao H. , Ji H. , 2020, Adaptive-neural-network-based trajectory tracking control for a nonholonomic wheeled mobile robot with velocity constraints, IEEE Transactions on Industrial Electronics, Vol. 68, No. 6, pp. 5057-5067DOI
10 
Trybala P. , John A. , Kohler C. , 2022, Towards a mine 3D dense mapping mobile robot: A system design and preliminary accuracy evaluation, Markscheidewesen, Vol. 129, No. 1, pp. 18-24DOI
11 
Yang Z. , Li M. , Zha F. , Wang X. , Guo W. , 2021, Imitation learning of a wheeled mobile manipulator based on dynamical movement primitives, Industrial Robot, Vol. 48, No. 4, pp. 556-568DOI
12 
Benseddik H. E. , Morbidi F. , Caron G. , 2020, PanoraMIS: An ultra-wide field of view image dataset for vision-based robot-motion estimation, The International Journal of Robotics Research, Vol. 39, No. 9, pp. 1037-1051DOI
13 
Vasilopoulos V. , Pavlakos G. , Schmeckpeper K. , Daniilidis K. , Koditschek D. E. , 2022, Reactive navigation in partially familiar planar environments using semantic perceptual feedback, The International Journal of Robotics Research, Vol. 41, No. 1, pp. 85-126DOI
14 
Yang F. , Cao Y. , Zhang W. , 2022, PSL-SLAM: A monocular SLAM system using points and structure lines in Manhattan World, International Journal of Intelligent Robotics and Applications, Vol. 6, No. 1, pp. 52-68DOI
15 
Ran T. , Yuan L. , Zhang J. , Tang D. , He L. , 2021, RS-SLAM: A robust semantic SLAM in dynamic environments based on RGB-D sensor, IEEE Sensors Journal, Vol. 21, No. 18, pp. 20657-20664DOI
16 
Long Z. , Zhang X. , He M. , Huang S. , Qin G. , Song D. , 2022, Motor fault diagnosis based on scale invariant image features, IEEE Transactions on Industrial Informatics, Vol. 18, No. 3, pp. 1605-1617DOI
17 
Mahale P. , Shaikh F. M. , 2023, Simplified Levenberg-Marquardt method in Banach spaces for nonlinear ill-posed operator equations, Applicable Analysis, Vol. 102, No. 1, pp. 124-148DOI
18 
Zeng Q. , Gao C. , Chen Z. , Jin Y. , Kan Y. , 2021, Robust mono visual-inertial odometry using sparse optical flow with edge detection, IEEE Sensors Journal, Vol. 22, No. 6, pp. 5260-5269DOI
19 
Gui H. , Wang Y. , Su W. , 2021, Hybrid global finite-time dual-quaternion observer and controller for velocity-free spacecraft pose tracking, IEEE Transactions on Control Systems Technology, Vol. 29, No. 5, pp. 2129-2141DOI
20 
Arslan S. , Tezer-Sezgin M. , 2021, Convergence, stability, and numerical solution of unsteady free convection magnetohydrodynamical flow between two slipping plates, Mathematical Methods in the Applied Sciences, Vol. 45, No. 1, pp. 21-35DOI
21 
Xu B. , Davison A. J. , Leutenegger S. , 2021, Deep probabilistic feature-metric tracking, IEEE Robotics and Automation Letters, Vol. 6, No. 1, pp. 223-230DOI
22 
Jang Y. , Oh C. , Lee Y. , Kim H. J. , 2021, Multirobot collaborative monocular SLAM utilizing rendezvous, IEEE Transactions on Robotics, Vol. 37, No. 5, pp. 1469-1486DOI
23 
Mota F. A. X. , Batista J. G. , Alexandria A. R. , 2024, Proposal of simultaneous localization and mapping for mobile robots indoor environments using Petri nets and computer vision, The International Journal of Advanced Manufacturing Technology, Vol. 135, No. 7, pp. 3991-4014DOI
24 
Zhang H. , Jin H. , Ma S. , 2023, Recent advances in robot visual SLAM, Recent Advances in Computer Science and Communications, Vol. 16, No. 8, pp. 19-37DOI
25 
Chen J. , Chen Z. , San H. , Zhao L. , Peng Z. , 2024, Adaptive illumination enhanced monocular vision SLAM algorithm for mobile robots based on convolutional neural networks, Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, Vol. 55, No. 12, pp. 383-391DOI
Qin Dong
../../Resources/ieie/IEIESPC.2026.15.2.293/au1.png

Qin Dong graduated from Nanjing University of Science and Technology majoring in computer science and technology (2007). Currently, she is working as an associate Professor and the Dean in the Department of Artificial Intelligence in the School of Computer Engineering at Yancheng Institute of Technology. She has published in more than 20 international reputed peer reviewed journals and conferences proceedings. Her research interests include deep learning, embodied robotics, and computer vision.