Mobile QR Code

2. (Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, Korea ghd3079@kaist.ac.kr )
3. (School of Computer Science and Electrical Engineering, Handong University, Pohang, Korea sshwang@handong.edu )

Autonomous valet parking system, Fisheye lens, Keyframe, Path planning, Visual SLAM

## 1. Introduction

Currently, various ADAS (advanced driver assistance systems) such as lane departure warning, adaptive cruise control, and autonomous driving are being developed. AVP (autonomous valet parking) is one of the ADAS being developed. It automatically navigates vehicles in the parking lot, parks the vehicle, and returns to the position prescribed by the driver [1].

Several functions such as vehicle localization, path generation, and parking bay recognition are required to perform AVP. These functions are implemented using several sensors that have been suggested for use to obtain the necessary information. In most previous works, camera sensors were used to recognize the parking bay [2]. For other functions, previous works use sensors such as LiDAR [3,4], GPS [5,6], and ultrasonic sensors [7]. However, there are several issues in using sensors other than cameras: GPS sensors cannot be used indoors, the accuracy of ultrasonic sensors is limited, and LiDAR sensors are too expensive to be generalized. Hence, it is preferable to perform AVP by using camera sensors only.

A visual SLAM (Simultaneous Localization And Mapping) technology can potentially implement the essential functions for AVP. As its name implies, a visual SLAM generates a map of the surrounding environment and localizes the vehicle within the map. From the generated map, the path for driving can be estimated [8,9], and a vehicle can be localized. Furthermore, AVP is operated in relatively limited places and at lower speeds conducive to a visual SLAM operation. However, a few issues need to be addressed when a visual SLAM is used for AVP. Even though previous research in literature had focused on properly generating maps [10], ways to control vehicles based on a visual SLAM were not suggested. In addition, a visual SLAM itself does not consider the mechanical property of vehicles, and a vehicle may not move according to the path a visual SLAM suggests. For example, a vehicle may fail to follow the trajectory because of its angular limit on the steering wheel. Scale estimation is another problem that needs to be resolved when a monocular camera is used for visual SLAM. It is because the scale ratio between the map and the physical world is essential to indicate the exact distance required to create a parking path.

This research presents a visual SLAM-based control of vehicles in AVP. We suggest a waypoint-based control of vehicles, and the locations of keyframes serve as waypoints. The position of keyframes may be unreachable under certain conditions. Hence, unreachable areas are estimated first to handle them further. Subsequently, the vehicle is controlled not to locate the target keyframe in the unreachable areas while following the trajectory. Further, a scale line whose length is known in advance is used to estimate the scale ratio. The scale ratio can be obtained by comparing the coordinate displacement of the vehicle on the map with the length of the scale line. However, the scale of the map could change if the errors of the generated map information are accumulated. Hence, the scale is updated continuously to prevent error accumulation.

This paper is organized as follows. Section 2 introduces the related works on visual SLAM and its application in autonomous driving systems. In section 3, the detailed design of the proposed system is described. We first present the overview of the AVP system, which includes the proposed solution. After that, keyframe-based control of vehicles, scale ratio estimation between SLAM map and the physical world, and other modules are presented. Subsequently, in section 4, we show several experiment results, and we conclude in section 5.

## 2. Related Work

There are a lot of research works related to visual SLAM reported in the literature. Therefore, we categorized them into general visual SLAM and visual SLAM for autonomous driving according to the environment and purposes.

### 2.1 General Visual SLAM

Most visual SLAM methods extract visual features from images and use the features described by several models like FAST, SIFT, SURF, BRIEF, and ORB. Pixels that correspond to visual features are used for map point generation by two-view triangulation [11]. By tracking previously extracted features in the current frame, visual SLAMs estimate the current camera pose. Among various image features, ORB is combined with a FAST detector and a BRIEF descriptor and is computationally cheaper and more efficient than traditional features like SIFT and SURF [12]. ORB SLAM is developed to enable mapping and localization using these advantages of the ORB feature [13] and is the most widely used technology in autonomous driving research.

Deep learning-based features have been suggested recently [14], and they report better performance in coping with perspective distortion. However, the use of these features requires the utilization of GPUs. End-to-end SLAM techniques have also been suggested [15-17]; however, their performance is limited thus far [18].

### 2.2 Visual SLAM for Autonomous Driving

Fisheye cameras are usually used for ADAS to minimize the number of cameras while maximizing the field of view around a vehicle. However, fisheye cameras have a greater radial distortion instead of having a wider angle of view. To handle radial distortion, Cubemap SLAM has been suggested [19]. It uses the same pipeline of ORB SLAM except that it projects a 2D image onto a cube in a 3D space, and then the planar figure of the cube map is used to extract the features.

Certain works reported in the literature [20,21] detect features of objects observed in driving environments such as road signs to enhance the accuracy of visual SLAM for autonomous driving. An AVP-SLAM has also been suggested and performs well, especially in parking lots. It segments areas with the characteristics of the parking lot environment, such as speed bumps, parking lines, and kerb with the help of a neural network. However, it utilizes extra sensors (ultrasonic sensors and wheel-encoder) to increase localization accuracy. Furthermore, the pre-mentioned methods cannot be considered a complete AVP system because they have no visual SLAM-based vehicle control and mandatory functions like path planning.

## 3. The Proposed Scheme

### 3.1 System Overview

Fig. 1 illustrates the overview of the proposed valet parking system. The system is largely divided into four parts: road sign-based vehicle control, keyframe-based vehicle control, autonomous parking, and return driving. In the road sign-based vehicle control stage, the vehicle is basically controlled by straight driving, and the road signs such as arrows on the ground of the parking lot are recognized to control the steering. In this way, the vehicle circles the parking lot, and the SLAM is operated to generate a map of the parking lot. When the vehicle returns to the starting position while driving inside the parking lot, loop closing to minimize the accumulated error of the map is performed. From this moment, the vehicle is not controlled by road signs recognition but rather controlled by keyframes created in the SLAM map. Furthermore, autonomous parking is performed if an empty parking bay is detected in keyframe-based driving. Finally, when the user calls the vehicle after it is parked, it returns to the user called position.

The proposed system simultaneously performs autonomous driving, scale ratio calculation, and parking bay detection. The scale ratio is used when an actual distance in the physical world needs to be represented in the SLAM map. Also, parking bay detection based on line detection is performed and is used in the parking procedure with the scale ratio.

The proposed system uses four camera sensors attached in four directions (front, rear, left, and right). Visual SLAM is operated using the front camera. The other sensors are used to detect empty parking bays and execute the parking algorithm accordingly. Also, the vehicle drives at a constant speed, and it is assumed that the length of the scale line is known in advance. Finally, a LUT (Lookup Table) effectively uses the AVM (Around View Monitor) technique to correct radial distortions in fisheye images. In other words, not the entire process of the AVM system is used. But, coordinate relationships between the images with and without distortion correction are used to reduce the computational cost.

### 3.2 Keyframe-based Vehicle Control System

The proposed system performs complete autonomous driving throughout the entire process, and the way of driving varies depending on the sufficiency of map data. If the information generated by SLAM is enough to understand the structure of the parking lot, it is used to drive the vehicle. Before that, the vehicle must be driven using other information. This paper sets the occurrence of loop closing as the criteria for the sufficiency of map data to distinguish these two stages.

After loop closing occurs, autonomous driving is carried out using keyframe data in the SLAM map. Keyframes represent the position of the camera when the input frame can be distinguished from others while the map is generated. Therefore, following the keyframes in the order in which they are created can drive the trajectory passed by before loop closing.

Autonomous driving using this method is performed in 4 steps.

1) Sort keyframes in order according to the timing they are created.

2) Set one target keyframe at which the vehicle should arrive. At this time, the target keyframe must be located ahead of the vehicle because the vehicle must drive forward. Therefore, the closest keyframe is chosen as the target keyframe among those located ahead of the vehicle.

3) Calculate the coordinate of the target keyframe with respect to the vehicle coordinate system.

4) Control the steering of the vehicle so that it can be driven towards the target keyframe.

The fourth step is repeated until the vehicle reaches the target keyframe, and then the second to the fourth steps are repeated until an empty parking bay is found.

The steering when the keyframe is created must also be considered to follow the trajectory identically. But, on the other hand, the pre-mentioned method only follows the position of the keyframes. Therefore, there is a limitation that it may be impossible to drive along the trajectory exactly. Hence, we designed an exception handling that allows the vehicle to correct its location when it is out of the trajectory.

##### (1)
$$\left[\begin{array}{l} x_{c} \\ z_{c} \end{array}\right]=\frac{1}{2}\left[\begin{array}{ll} x_{2}-x_{1} & z_{2}-z_{1} \\ x_{3}-x_{2} & z_{3}-z_{2} \end{array}\right]^{-1}\left[\begin{array}{l} x_{2}^{2}-x_{1}^{2}+z_{2}^{2}-z_{1}^{2} \\ x_{3}^{2}-x_{2}^{2}+z_{3}^{2}-z_{2}^{2} \end{array}\right]$$
##### (2)
$$r=\sqrt{\left(x_{c}-x_{1}\right)^{2}+\left(z_{c}-z_{1}\right)^{2}}$$

If the target keyframe is in unreachable areas, the vehicle is controlled to correct its location. In this case, the vehicle reverses in the opposite direction of the target keyframe’s position with respect to the vehicle’s coordinate system. For example, if the target keyframe is unreachable and is in front of the left side of the vehicle, then the vehicle reverses to the right rear. This correction process is only carried out until the keyframe is out of the unreachable areas, and the vehicle follows the trajectory again after that.

### 3.3 Scale Ratio Estimation

The initial scale between the SLAM map and the physical world is given at the beginning of the system. This scale is used for all processes using the SLAM map. The vehicle continues driving, knowing the scale ratio between the SLAM map and the real world.

Scale update is the process of constantly modifying the relational expression between the actual length and the SLAM coordinates because of scale changes in the SLAM map. SLAM map is created based on the coordinates of matched feature points. If the errors in feature point matching accumulate, the scale of the SLAM map may change. As a result, the initial scale value may be unsuitable for the current SLAM map.

First of all, the time during which the scale line is detected is measured when the initial scale is obtained and to perform scale updates. Then, when a scale update is needed, the vehicle’s displacement on the SLAM map over the measured time is calculated again. Since accurate scale calculation requires the displacement of the vehicle traveling at the same speed, the above process was performed only during straight driving. Curved driving poses difficulties in calculating the distance of vehicles traveling.

The new scale is calculated using the distance of the vehicle traveling on the newly obtained SLAM map and the distance of the vehicle traveling when calculating the initial scale. A newly updated scale can be calculated as per Eq. (3), which allows the use of the scale that fits the SLAM map even if there is an error

##### (3)
$$\left(\text { Scale }_{\text {new }}\right)=\left(\text { Scale }_{\text {existing }}\right) * \frac{\text { Distance }_{\text {existing }}}{\text { Distance }_{\text {new }}}$$

#### 3.4 Other Modules

Road signs recognition and parking lot detection were assumed to be performed in AVM images. However, since the cameras attached to the vehicle were fisheye cameras, it was impossible to apply algorithms to detect edges and lines for parking bay detection due to severe radial distortion. Therefore, we corrected the distortion and merged it into the aerial AVM image to identify the parking bay. However, the merging process has a high computational cost and is unsuitable for the real-time parking system. Hence, the proposed system employed a LUT, which contains mapping relationships between AVM images and fisheye images. After storing the ROI of the AVM image that needs to be calculated in advance of the LUT, the system performed parking line detection by approaching the pixel value corresponding to the coordinates of the fisheye image in the LUT. In other words, real-time performance is guaranteed by significantly reducing processing time as the mapping information is stored in the LUT and is read-only when necessary.

The actual parking process is implemented in five steps. The first step locates the vacant parking bay after

positively identifying it through the pre-mentioned process. In this step, the steering angle of the wheels is changed in the opposite direction of the parking bay. The vehicle is then driven forward a certain distance depending on the type of vehicle. The vehicle is driven while comparing the vehicle’s current location with the destination indicated after mapping to keep the vehicle within a certain distance. The third parking step is carried out after the vehicle has moved to a destination a certain distance away. In this step, the vehicle’s wheels are steered in the direction of the parking bay, and the vehicle is then reversed. If the vehicle is parallel to the parking line while reversing, the vehicle’s steering angle is restored to be straight (i.e., parallel to the parking line as well). The fisheye cameras attached to the left and right of the vehicle are used to determine whether the vehicle is parallel to the parking line. The slope of the detected lines is obtained in advance when the vehicle is parallel to the parking line. If the two slopes are identical, the vehicle is considered to be parallel to the parking line. If the vehicle reverses parallel to the parking line, it starts to determine in an intelligent manner if it has entered the parking bay exactly. The rear camera continuously compares the gap between the vehicle and the line behind the parking bay (i.e., the baseline) by detecting the line continuously. When this gap decreases below a certain threshold, the vehicle is considered to be correctly located inside the parking lot and is stopped. The parking algorithm is then terminated.

## 4. Experiments

### 4.1 Implementation Environment

The environment used to implement the proposed system comprised an Intel (R) Core (TM) i7-7700HQ CPU, 8GB RAM, and Ubuntu 16.04 OS. The experiment was conducted indoors and outdoors to simulate an actual parking lot. Additionally, the road signs and parking bays were reduced in proportion according to the vehicle. Furthermore, the distance used in the parking algorithm was also calculated and adjusted to the rotation of the HENES T870 model. Finally, a nucleo board was used to send vehicle control signals.

### 4.2 Keyframe-based Autonomous Driving

Two errors, namely translation and rotation errors, need to be estimated to measure the accuracy of the control algorithm used in keyframe-based autonomous driving. They were measured when the vehicle returned to its original position after loop closing. Specifically, translation error was estimated by measuring the position difference, and rotation error was estimated by measuring the angle difference between the initial and returned driving. The driving is successful if the translation error is less than 10 cm and the rotation error is less than 15 degrees.

At the end of the experiment, the translation measurement reported an 80% success, and the rotation measurement reported an 86.67% success. The experiment was performed fifteen times at three locations shown in Fig. 3, and the result is shown in Table 1.

The experiment was conducted in indoor and outdoor environments of various sizes and showed a high accuracy and success rate. In comparison, most autonomous driving methods for autonomous valet parking systems mainly use infrastructure that costs a lot. For example, Huang et al. [22] proposed autonomous driving while detecting infrastructure such as a mark on the wall to follow the route that vehicle has already traveled to achieve high accuracy. The results in Table 1 show that similar results can be produced without using any infrastructure for driving route creation and tracking, even though there is a slight difference in performance between these two works. This comparison means that using only SLAM data for autonomous driving works without a significant error. Therefore, the experimental results above mean that keyframe-based vehicle control can be applied in AVP, which repeats driving within the same environment.

##### Table 1. Result of the keyframe-based autonomous driving experiment.}
 Movement Success Rate (%) Translation 80 Rotation 86.67

### 4.3 Parking

In this experiment, the success rate and accuracy of the parking algorithm were measured. The experiments performed fifteen times were conducted indoors with light reflections on the ground and outdoors with asphalt construction, a typical parking lot environment, to check the robustness of parking line detection. Also, the parking bay was placed only on the left side of the vehicle for repetitive experiments under the same conditions.

If the distance of the nearest side parking line from the vehicle is less than 18 cm, it was considered a successful case, as shown in Fig. 4. The value of 18 cm was estimated by considering the size of the parking bay and the vehicle. As shown in Table 2, the indoor experiment reported a success rate of about 66%, and the outdoor experiment reported a success rate of about 73%. Parking line detection and template matching were performed normally in both environments. However, if the vehicle could not move a certain distance exactly during the implementation of the parking algorithm, it would be incorrectly parked, as shown in Fig. 5. This problem appears to be caused by the incorrect scale ratio calculation between the map and the physical world. The other processes like parallelity and completeness of parking were performed normally.

##### Table 2. Result of the autonomous parking experiment.}
 Environment Success Rate (%) Indoor 66.67 Outdoor 73.33

## 5. Conclusion

This research proposes a vehicle control method for a fully autonomous valet parking system. The proposed method is based on visual SLAM, and a vehicle is driven by tracing the trajectory using keyframes in the SLAM map. Also, we proposed to estimate the scale ratio between the SLAM map and the physical world by introducing a scale line and consistent update.

As future work, scale estimation without auxiliary scale lines will be studied. Furthermore, as loop closing is not always guaranteed, visual SLAM-based control without loop closing will also be studied. Lastly, the proposed system will be tested on the real car for commercialization.

### ACKNOWLEDGMENTS

This work was supported by Korea Evaluation Institute of Industrial Technology (KEIT) grant funded by the Korea government (Ministry of Trade, Industry and Energy) in 2020 (No.20009775, Development of AI based Around View Monitoring SoC for Automated Valet Parking)

### REFERENCES

1
Khalid M., Wang K., Aslam N., Cao Y., Ahmad N., Khan M.K., February 2021, From smart parking towards autonomous valet parking: A survey, challenges and future Works,, Journal of Network and Computer Applications, Vol. 175
2
Han S. J., Choi J., December 2015, Parking Space Recognition for Autonomous Valet Parking Using Height and Salient-Line Probability Maps, ETRI Journal, Vol. 37, No. 6, pp. 1220-1230
3
Yin H., Wang Y., Tang L., Ding X., Huang S., Xiong R., February 2021, 3D LiDAR Map Compression for Efficient Localization on Resource Constrained Vehicles, IEEE Transactions on Intelligent Transportation Systems, Vol. 22, No. 2, pp. 837-852
4
Zhang J., Singh S., July 2014., LOAM: Lidar Odometry and Mapping in Real-time, Robotics: Science and Systems, Vol. 2, No. 9
5
Agrawal M., Konolige K., 2006, Real-time localization in outdoor environments using stereo vision and inexpensive gps, 18th International Conference on Pattern Recognition (ICPR’06), pp. 1063-1068
6
Tseng P. K., Hung M. H., Yu P. K., Chang S. W., Wang T. W., September 2014, Implementation of an autonomous parking system in a parking lot, 2014 world congress on intelligent transport systems
7
Luca R., Troester F., Gall R., Simon C., January 2010, Autonomous Parking Procedures Using Ultrasonic Sensors, Annals of DAAAM & Proceedings
8
Kummerle R., Hahnel D., Dolgov D., Thrun S., Burgard W., 2009, Autonomous driving in a multi-level parking structure, 2009 IEEE International Conference on Robotics and Automation, pp. 3395-3400
9
Song J., Zhang W., Wu X., Cao H., Gao Q., Luo S., September 2019, Laser-based SLAM automatic parallel parking path planning and tracking for passenger vehicle, IET Intelligent Transport Systems, Vol. 13, No. 10, pp. 1557-1568
10
Qin T., Chen T., Chen Y., Su Q., 2020, Avp-slam: Semantic visual mapping and localization for autonomous vehicles in the parking lot, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5939-5945
11
Hartley R., Zisserman A., 2004, Multiple View Geometry in Computer Vision, Cambridge University Press, pp. 237-262
12
Rublee E., Rabaud V., Konolige K., Bradski G., 2011, ORB: An efficient alternative to SIFT or SURF, 2011 International Conference on Computer Vision, pp. 2564-2571
13
Mur-Artal R., Montiel J. M. M., Tard\'{o}s J. D., October 2015, ORB SLAM: a versatile and accurate monocular SLAM system, IEEE Transactions on Robotics, Vol. 31, No. 5, pp. 1147-1163
14
Li H., Xiong P., Fan H., Sun J., 2019, Dfanet: Deep feature aggregation for real-time semantic segmentation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522-9531
15
Iyer G., Krishna Murthy J., Gupta G., Krishna M., Paull L., 2018, Geometric consistency for self-supervised end-to end visual odometry, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 267-275
16
Zhan H., Garg R., Weerasekera C. S., Li K., Agarwal H., Reid I., 2018, Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 340-349
17
Sheng L., Xu D., Ouyang W., Wang X., 2019, Unsupervised collaborative learning of keyframe detection and visual odometry towards monocular deep slam, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4302-4311
18
Zhan H., Weerasekera C. S., Bian J. W., Reid I., May 2020, Visual odometry revisited: What should be learnt?, 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4203-4210
19
Wang Y., Cai S., Li S. J., Liu Y., Guo Y., Li T., Cheng M. M., December 2018, Cubemap slam: A piecewise-pinhole monocular fisheye slam system, Asian Conference on Computer Vision, pp. 34-49
20
Schreiber M., Knöppel C., Franke U., June 2013, Laneloc: Lane marking based localization using highly accurate maps, 2013 IEEE Intelligent Vehicles Symposium (IV), pp. 449-454
21
Ranganathan A., Ilstrup D., Wu T., November 2013, Light-weight localization for vehicles using road markings, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 921-927
22
Huang Y., Zhao J., He X., Zhang S., Feng T., 2018, Vision-based Semantic Mapping and Localization for Autonomous Indoor Parking, IEEE Intelligent Vehicles Symposium (IV), pp. 636-641

## Author

##### Young Gon Jo

Young Gon Jo received his B.S. degree in computer science and electrical engineering from Handong University in 2021. His research interest includes image processing, SLAM, and autonomous driving technology.

##### Seok Hyeon Hong

Seok Hyeon Hong received his B.S. degree in computer science and engineering from Handong University in 2021. His research interest includes computer graphics and computer vision.

##### Jeong Mok Ha

Jeong Mok Ha received his B.S. degree in electrical engineering from Pusan National University in 2010 and a Ph.D. degree from Pohang University of Science and Technology (POSTECH) in 2017. He is interested in automotive vision, including camera calibration, surround view, deep learning, and SLAM.

##### Sung Soo Hwang

Sung Soo Hwang received his B.S. degree in electrical engineering and computer science from Handong University in 2008 and M.S. and Ph.D. degrees from Korea Advanced Institute of Science and Technology in 2010 and 2015, respectively. His research interest includes the creation and operation of video maps for augmented reality and autonomous driving.