Mobile QR Code QR CODE

  1. (Department of IT Convergence Engineering, Kumoh National Institute of Technology / Gumi, Korea,, )

UAVs, CNN, Path planning, Stair climbing, LiDAR sensor

1. Introduction

UAV use is growing in areas such as scientific research, rescue missions, commerce, and agriculture. Originally, UAVs were developed to be managed by an on-the-ground pilot via remote-control communication [1]. Recently, UAVs have been moving closer to navigating with unusual degrees of autonomy. Most UAVs employ global navigation satellite system technology and inertial sensors to determine their geospatial positioning. It is necessary to overcome factors such as GPS signal error, narrow passageways, and transparent glass for stable-flight UAVs in indoor environments [2]. Studies in image-based stair-recognition for robots [3] and of techniques for ground robots [4] are ongoing; however, there is a lack of such research with UAVs. An abundance of techniques, varying from learning-based to non-learning-based, have been suggested to resolve UAV navigation dilemmas. The most popular non-learning-based method is sensing and avoidance, which prevents accidents by steering vehicles in a reverse orientation and navigating by path planning [5,6]. Another type of non-learning-based technique takes advantage of simultaneous localization and mapping (SLAM). The inspiration is that, after creating a map of the surroundings by utilizing SLAM, navigation is accomplished by path planning [7,8]. The work in [7] combines GraphSLAM [9] with an online path planning module in a proposal-approving UAV to determine obstacle-free trajectories in foliage. A general characteristic of non-learning-based approaches is that they demand precise path planning, which may result in unanticipated failures when environments are extremely dynamic and complicated. To address this matter, machine learning (ML) methods such as imitation learning and reinforcement learning (RL) have been explored [10-12]. For example, a model-based RL approach called TEXPLORE [12] was presented, which is a high-level control system for navigation of a UAV within a grid map having no barriers. And an imitation learning-based controller utilizing a small set of human displays was presented that obtains reliable performance in forested areas [10].

Therefore, this paper proposes a convolutional neural network (CNN)-based system based on real-time stair recognition that can fly a UAV without colliding with stairs, and that obtains distance information between walls or stairs through 2D light detection and ranging (LiDAR) with a camera mounted on the UAV. In addition, algorithms were designed for systems that recognize stairs, avoid collisions, and maneuver themselves, which is one of the obstacles to an autonomous flight process, and flight experiments were carried out after the actual UAV was implemented.

Deep learning (DL), which is a subcategory of machine learning, acts like the human brain, and is therefore known as artificial intelligence (AI). Many applications of machine learning have been proposed, with different signals representing data such as music signals [13], 2D signals or images [14], and video signals [15]. CNNs are used for various purposes, such as classification, detection, and pattern recognition, especially in health [16], drone applications [17], and autonomous driving systems. Recently, You Only Look Once (YOLO) was introduced for real-time detection of objects, with each version improving the mean average precision (mAP) per frame per second [18].

In this work, we attempted for the first time to use the YOLOv3-tiny model, and improved the model further by adding a convolution layer to extract deep features for the detection of stairs. This DL detection model was used in a classification problem to determine each next maneuver.

The rest of this paper is organized as follows. Section 2 details related work, while Section 3 explains the proposed scheme. Section 4 summarizes the experimental results and the analysis. Section 5 provides concluding statements and suggests the scope of future work.

2. Related Work

Previously, a 3D map of the local area was developed for autonomous UAV navigation. In some cases, these methods were used to map exact quadcopters [19,20]. However, these methods are based on a smart control scheme, thereby restricting their use to laboratory settings [21-23]. The map is learned through other manual route methods, and quadcopters travel the same path [24]. For most outdoor flights (where precision is not as high as indoors), a GPS-based posing projection is used.

Most applications use scale sensors, such as infrared sensors, RGB-D (red, green, blue depth) sensors, or laser range sensors [25]. A single ultrasonic sensor was used in [26] as an automated navigation device with an infrared sensor. The condition evaluation method of the LiDAR and inertial measurement unit (IMU) was advanced to work independently in uncertain conditions that are denied by a GPS [27]. Range sensors have limitations, being heavy and high in power consumption.

The simultaneous localization and mapping (SLAM) technique uses separate optical sensors to create a 3D image [21-23] from every UAV position on the map. A 3D map of an unknown indoor scenario was used for the SLAM laser range finder [25]. The SLAM technique [29,31] offers single-camera indoor navigation. SLAM is highly complicated when it comes to regenerating the 3D map region, requiring precise measurements and extensive resources because additional sensors are needed.

SLAM can also set contact delays during real-time navigation. The studies in [31] and [32] addressed these issues. SLAM is primarily a practical system, and its output with indoor materials (such as walls/roofs) is not considered good, because its differential intensity is very weak. The entire corridor comprises partitions, roofs, and floors, and SLAM technologies cannot attain the desired navigational quality.

3. The Proposed Scheme

This section discusses the system configuration for UAV recognition of stairs, the deep learning model using YOLOv3-tiny, and the improved YOLOv3-tiny for detecting stairs.

3.1 System Configuration

The proposed system was designed based on recognizing stairs with a camera mounted on the UAV for indoor environments and on distances measured via the 2D LiDAR sensor attached to the UAV’s side. Fig. 1 shows the flowchart for the entire system. The connections and communications between the parts are both wired and wireless, as shown in Fig. 2. In particular, communications among the ground control station, the UAV, and the onboard PC is via Wi-Fi/LTE. Meanwhile, the wired connection is only used for the sensor.

The system’s actual implementation uses a Parrot Bebop 2 drone, which is suitable for narrow passageways and convenient for load sensors. The UAV is equipped with an RPLiDAR S1 laser scanner, which rotates 360$^{\circ}$ and can measure distances up to 40m with a lightweight, mainboard Jetson TX2 embedded computing device (Auvidea J120 carrier board) as shown in Fig. 3(c). The Lenovo ThinkPad T580 is used as a ground control system (GCS), and the equipment required for the experiment is listed in Table 1. All algorithms are implemented in Python, and the Robot Operating System (ROS) was used as middleware (software that can run multiple different programs together) in a kinetic version.

Fig. 1. Flowchart for the proposed implementation.


Fig. 2. Network connections and the architecture of the proposed system.


Fig. 3. System configuration: (a) UAV movement axes; (b) illustration of the RPLiDAR S1 scanning process; (c) the 2D-LiDAR sensor and the Jetson-TX2 onboard PC attached to the UAV; (d) the test environment.


Table 1. Experiment Parameters.


Model name


Lidar sensor




Bebop drone 2


Onboard PC

Jetson TX2


Carrier board

Auvidea J120



ThinkPad T580


LTE modem



Algorithm 1. Stair-climbing algorithm.


The LiDAR sensor uses distances measured along 360 points, as shown in Fig. 3(b). The distance data obtained by the LiDAR sensor were 0$^{\circ}$ to the floor, 90$^{\circ}$ to the front, and 180$^{\circ}$ to the ceiling, based on the direction of progress for the UAV. In the polar coordination system, each of the raw laser points is defined as $\{(\textit{d}$_${i}$, ${\theta}$$_{i}$); 0 ${\leq}$ $\textit{i}$ ${\leq}$ 359$\}$, where $\textit{d}$$_{i}$ is the distance from the UAV center to the object, and ${\theta}$$_{i}$ is the relative angle of measurement. The information obtained by the LiDAR is stored as a vector $(\textit{d}$$_{i}$, ${\theta}$$_{i}$), and the stored data are checked to convert the values of the infinity scan.

3.2 Stair-climbing System

Algorithm 1 is used by the UAV to climb stairs. When steps are recognized by the camera, the algorithm starts. If the distance between the UAV and the stairs is longer than r meters, a straight start is performed on the $\textit{x-axis}$, or a rising maneuver on the $\textit{z-axis}$, to avoid collisions if the distance is less than $\textit{r}$ m. At this instant, if a staircase is not recognized, the stair climb mission is determined as complete, and recognition for climbing the next step commences.

3.3 Deep Learning Model for Detection of Stairs

In this study, a DL approach is implemented for detecting stairs, which the drone uses to make decisions intelligently in order to follow the stairs and determine the next maneuver. In this work, we improved the YOLOv3-tiny default model. The backbone of YOLO is $\textit{darknet}$, where the YOLOv3-tiny default model uses six max-pooling and seven convolution layers. We modified it by adding one more layer. Instead of the softmax function, and where multi-class classification and detection is an issue, regression is employed to solve the multi-class detection and classification problem [33].

The proposed model starts by dividing the stair-image input into a G ${\times}$ G grid in the training stage. A bounding box is used as a tool for labeling five features—width $\textit{w}$, height $\textit{h}$, vertical height $\textit{v}$, horizontal height $\textit{u}$—as shown in Fig. 4, and confidence score $\textit{C,}$ which represents the presence of stairs within the bounding box, and hence, represents the accuracy.

Fig. 4. Definition of the bounding box.


Fig. 5. ROS node graph.


Fig. 6. YOLO models: the default YOLOv3-tiny and the improved YOLOv3-tiny.


In the proposed YOLOv3-tiny method, we attempt to make the model computationally inexpensive, along with implementing it to extract more semantic features. Max-pooling is used after each convolution layer to reduce the computational complexity and improve image feature extraction. Fig. 6 shows the network architecture for both the default and the improved YOLOv3-tiny models. The loss function is obtained as an end-to-end network, and can be expressed as follows [33]:

$loss=\sum _{i=0}^{S^{a}}\,\textit{iouErr}+\textit{coordErr}+\textit{clsErr}$

where $\textit{iouErr}$, $\textit{coordErr}$, and $\textit{clsErr}$ indicate the IOU error, coordinates error, and classification error, respectively. We used a rectified linear unit (ReLU) as an activation function to achieve sparsity and reduce vanishing gradient issues [25]. Table 2 details the training configuration employed for both YOLOv3-tiny and the proposed improved YOLOv3-tiny model.

Table 2. Training Parameters for Both Models.

Parameters for training

Configuration values


428 x 428

Batch size


Learning rate



Stochastic gradient descent







3.4 ROS

The nodes that are separated and managed by the master are shown in Fig. 5. In addition, the topic node continuously communicates the results processed by the publisher node, and makes them available to other nodes by subscription. The system proposed in this paper is largely a UAV status message, a $\textit{scan}$ value obtained from the LiDAR, and a visual message obtained from the UAV camera. When running $\textit{darknet}$ on the ROS, the messages required from the published messages are subscripted. Among them, a message containing information on the bounding box is received through the $\textit{darknet_ros}$ node. When the proposed DL model detects a staircase, a message from the LiDAR is subscribed as a token that allows the UAV to perform actions and maneuvering based on the incoming output. This process continues till detection is performed within $\textit{darknet_ros}$.

4. Experimental Results and Analysis

A dataset was created in the Kumoh National Institute of Technology, South Korea, by employing a Bebop drone that has a high-resolution camera and a GPS mounted on it. The dataset comprises 1000 images at a resolution of $1920\times 1080$ resized to $428\times 428$ before model training. For training and testing purposes, the dataset was split 70% and 30%, respectively. Fig. 7 depicts the training phase of the proposed improved YOLOv3-tiny model where 20,000 epochs were set. As shown in Fig. 7, the blue line represents the average loss achieved (0.215) whereas the red line represents the highest mAP (91.6%).

The detection performance of the improved YOLOv3-tiny model was benchmarked against the default model by utilizing the same parametric configurations and dataset. The metrics used to reflect the efficacy in stair detection of both models are accuracy, recall, F1-score, and precision. Table 3 shows that the proposed improved YOLOv3-tiny model outperformed the default model in terms of accuracy, recall, and F1-score. Furthermore, a low precision value with higher values of other performance metrics shows stable performance from the model.

Fig. 8 shows the real-time detection of the proposed model, where the top left image represents the starting point of the UAV after takeoff, and the top right image represents the middle position of the UAV when hovering and climbing. In Fig. 8, the bottom left image shows the last step of the stairs, while the bottom right image shows the instant when the UAV was located at a distance of $r~ $ meters from the stairs.

For the experimental scenario, the set of stairs climbed was 2.1 m long and 2.85 m wide, as shown in Fig. 3(d). Based on Algorithm 1, Some of the experiment’s results are shown in Fig. 9, depicting commands sent by the GCS and the corresponding images from the built-in camera of the UAV. In Fig. 9, we have tried to show the different stages in the decisions made by the UAV, such as moving forward or upward, hovering, and going to the next stair to climb it. Furthermore, the actual trajectory-wise UAV movement from the beginning of the staircase to the beginning of the next step is shown in Fig. 10 as a 3D plot. This movement started at approximately 0.8 m from the starting point of the stairs. In total, 88 experiments were performed three times each, and the results are shown in Table 4 for the time elapsed during takeoff and landing on average, reported to be 55.97 sec.

Fig. 7. Training phase of the improved YOLOv3-tiny.


Fig. 8. Detection results from the improved YOLOv3-tiny model.


Fig. 9. GCS screen commands and screen shots from the UAV’s built-in camera for (a) forward movement; (b) upward movement; (c) hovering; (d) going to the next stair.


Fig. 10. Trajectory of the UAV.


Table 3. Performance of the Detection Scheme.

Parameter metrics


(%) [17]

Modified YOLOv3-tiny














Table 4. Performance Time of the Proposed Stair-climbing Scheme.
















5. Conclusion

In this study, we designed, implemented, and experimented with a system in which a UAV recognizes and climbs stairs, which are obstacles often encountered during indoor flight. The system was implemented through a CNN-based imaging process for real-time stair recognition and by using LiDAR-based distance measurements. The accuracy derived from stair recognition was 92.06%, and the actual test results showed that stair climbing was carried out without collisions.

Future research would require more efficient algorithms to climb various types of stairs. Moreover, the proposed system can be combined with SLAM navigation to expand studies to systems that can autonomously fly through multiple floors.


This paper was supported by the National University Development Project in 2020.


Prasad P. R., et al. , 2018, Monocular vision aided autonomous UAV navigation in indoor corridor environments., IEEE Transactions on Sustainable Computing, Vol. 4, No. 1, pp. 96-108DOI
Lu Y., et al. , 2018, A survey on vision-based UAV navigation., Geo-spatial information science, Vol. 21, No. 1, pp. 21-32DOI
Ilyas M., et al. , Jul 2018, Design of sTetro: A Modular, Reconfigurable, and Autonomous Staircase Cleaning Robot, Journal of Sensors, Vol. 2018, pp. 16DOI
Gao X., et al. , 2017, Dynamics and stability analysis on stairs climbing of wheel-track mobile robot, International Journal of Advanced Robotic Systems, Vol. 14, No. 4, pp. 1729881417720783DOI
Israelsen J., et al. , 2014, Automatic collision avoidance for manually tele-operated unmanned aerial vehicles., In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6638-6643DOI
Chnibo L., et al. , 2013, UAV position estimation and collision avoidance using the extended Kalman filter., IEEE Transactions on Vehicular Technology, Vol. 62, No. 6, pp. 2749-2762DOI
Cui J., et al. , 2016, Autonomous navigation of UAV in foliage environment., Journal of intelligent & robotic systems, Vol. 84, No. 1, pp. 259-276DOI
Huizhong Z., et al. , 2015, StructSLAM: Visual SLAM with building structure lines., IEEE Transactions on Vehicular Technology, Vol. 64, No. 4, pp. 1364-1375DOI
Oguz A. E., et al. , June 2014, On the consistency analysis of A-SLAM for UAV navigation., Proc. SPIE 9084, Unmanned Systems Technology XVI, Vol. 9084, pp. 90840RDOI
Ross S., et all. , 2013, Learning monocular reactive uav control in cluttered natural environments., In 2013 IEEE international conference on robotics and automation, pp. 1765-1772DOI
Fraust A., et all. , 2017, Automated aerial suspended cargo delivery through reinforcement learning., Artificial Intelligence, Vol. 247, pp. 381-398DOI
Imanberdiyev N., et all. , 2016, Autonomous navigation of UAV by using real-time model-based reinforcement learning., In 2016 14th international conference on control, automation, robotics and vision (ICARCV), pp. 1-6DOI
Sturm B. L., et al. , 2019, Machine learning research that matters for music creation: A case study, Journal of New Music Research, Vol. 48, No. 1, pp. 36-55DOI
Raharjo J., et al. , Nov 2019, Cholesterol level measurement through iris image using gray level co-occurrence matrix and linear regression, ARPN Journal of Engineering and Applied Sciences, Vol. 14, No. 21, pp. 3757-3763URL
Zhang Y., et al. , Jan 2020, Machine learning based video coding optimizations: A survey., Information Sciences, Vol. 506, pp. 395-423DOI
Heidari M., et al. , Sep 2020, Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms, International journal of medical informatics, Vol. 144, pp. 104284DOI
Hassan S. A., et al. , Oct. 2019, Real-time uav detection based on deep learning network, In 2019 International Conference on Information and Communication Technology Convergence, pp. 630-632DOI
Redmon J., et al. , 2016, You only look once: Unified, real-time object detection, In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788URL
Mellinger D., et al. , 2011, Minimum snap trajectory generation and control for quadrotors, In 2011 IEEE international conference on robotics and automation, pp. 2520-2525DOI
Mellinger D., et al. , Jan. 2012, Trajectory generation and control for precise aggressive maneuvers with quadrotors, The International Journal of Robotics Research, Vol. 31, No. 5, pp. 664-674DOI
Checchin P., et al. , 2010, Radar scan matching slam using the fourier-mellin transform, In Field and Service Robotics, Vol. 62, pp. 151-161DOI
Engel J., et al. , 2014, LSD-SLAM: Large-scale direct monocular SLAM, In European conference on computer vision, Vol. 8690, pp. 834-849DOI
Mei C., et al. , Jun. 2011, RSLAM: A system for large-scale mapping in constant-time using stereo, International journal of computer vision, Vol. 94, No. 2, pp. 198-214DOI
M uller M., et al. , Sep. 2011, Quadrocopter ball juggling, in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5113-5120DOI
Huang A. S., et al. , Aug. 2011, Visual odometry and mapping for autonomous flight using an RGB-D camera, Robotics Research., Vol. 100, pp. 235-252DOI
Roberts J. F., et al. , Sep. 2007, Quadrotor using minimal sensing for autonomous indoor flight, In European Micro Air Vehicle Conference and Flight Competition (EMAV2007)URL
Bry A., et al. , May. 2012, State estimation for aggressive flight in GPS-denied environments using onboard sensing, In 2012 IEEE International Conference on Robotics and Automation, pp. 1-8DOI
Bachrach A., et al. , Dec. 2009, Autonomous flight in unknown indoor environments, International Journal of Micro Air Vehicles, Vol. 1, No. 4, pp. 217-228DOI
Achtelik M., et al. , 2011, Onboard IMU and monocular vision based control for MAVs in unknown in-and outdoor environments., 2011 IEEE International Conference on Robotics and Automation, pp. 3056-3063DOI
Blösch M., et al. , 2010, Vision based MAV navigation in unknown and unstructured environments., 2010 IEEE International Conference on Robotics and Automation, pp. 21-28DOI
Nützi G., et al. , Nov. 2011, Fusion of IMU and vision for absolute scale estimation in monocular SLAM., Journal of intelligent & robotic systems, Vol. 61, No. 1, pp. 287-299DOI
Weiss S., et al. , 2012, Versatile distributed pose estimation and sensor self-calibration for an autonomous MAV, In 2012 IEEE International Conference on Robotics and Automation, pp. 31-38DOI
Rahim T., et al. , 2021, A Deep Convolutional Neural Network for the Detection of Polyps in Colonoscopy Images, Biomedical Signal Processing and Control, Vol. 68 102654DOI


Yeonji Choi

Yeonji Choi received her BSc in Electrical Engineering in 2019 and received her MSc from the Department of IT Convergence Engineering at Kumoh National Institute of Technology (KIT) Gumi, South Korea, in 2021. Currently, she is working as graduate research assistant at the Wireless and Emerging Network System (WENS) Lab in the Department of IT Convergence Engineering, Kumoh National Institute of Technology (KIT), Gumi, South Korea. Her major research interests include intelligent control and systems, Unmanned Aerial Vehicles, and wireless communications.

Tariq Rahim

Tariq Rahim is a PhD student in the Wireless and Emerging Network System Laboratory (WENS Lab) of the Department of IT Convergence Engineering, Kumoh National Institute of Technology, Republic of Korea. He completed his master’s degree in Information and Communication Engineering from Beijing Institute of Technology, PRC, in 2017. His research interests include image and video processing and quality of experience for high-resolution videos.

Soo Young Shin

Soo Young Shin received his BSc, MSc, and PhD in Electrical Engi-neering and Computer Science from Seoul National University, Korea, in 1999, 2001, and 2006, respectively. He was a visiting scholar for the FUN Lab at the University of Washington, U.S.A., from July 2006 to June 2007. After three years working in the WiMAX Design Lab of Samsung Electronics, he is now an associate professor for the School of Electronics at Kumoh National Institute of Technology, joining the institute in September 2010. His research interests include wireless LANs, WPANs, WBANs, wireless mesh networks, sensor networks, coexistence among wireless networks, industrial and military networks, cognitive radio networks, and next-generation mobile wireless broadband networks.