JoYounggon1
HongSeokhyeon2
HaJeongmok1
HwangSungsoo3
-
(Algorithm Team, VADAS Co., Ltd., Pohang, Korea {ygjo, jmha}@vadas.co.kr)
-
(Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology,
Daejeon, Korea
ghd3079@kaist.ac.kr
)
-
(School of Computer Science and Electrical Engineering, Handong University, Pohang,
Korea sshwang@handong.edu )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Autonomous valet parking system, Fisheye lens, Keyframe, Path planning, Visual SLAM
1. Introduction
Currently, various ADAS (advanced driver assistance systems) such as lane departure
warning, adaptive cruise control, and autonomous driving are being developed. AVP
(autonomous valet parking) is one of the ADAS being developed. It automatically navigates
vehicles in the parking lot, parks the vehicle, and returns to the position prescribed
by the driver [1].
Several functions such as vehicle localization, path generation, and parking bay recognition
are required to perform AVP. These functions are implemented using several sensors
that have been suggested for use to obtain the necessary information. In most previous
works, camera sensors were used to recognize the parking bay [2]. For other functions, previous works use sensors such as LiDAR [3,4], GPS [5,6], and ultrasonic sensors [7]. However, there are several issues in using sensors other than cameras: GPS sensors
cannot be used indoors, the accuracy of ultrasonic sensors is limited, and LiDAR sensors
are too expensive to be generalized. Hence, it is preferable to perform AVP by using
camera sensors only.
A visual SLAM (Simultaneous Localization And Mapping) technology can potentially implement
the essential functions for AVP. As its name implies, a visual SLAM generates a map
of the surrounding environment and localizes the vehicle within the map. From the
generated map, the path for driving can be estimated [8,9], and a vehicle can be localized. Furthermore, AVP is operated in relatively limited
places and at lower speeds conducive to a visual SLAM operation. However, a few issues
need to be addressed when a visual SLAM is used for AVP. Even though previous research
in literature had focused on properly generating maps [10], ways to control vehicles based on a visual SLAM were not suggested. In addition,
a visual SLAM itself does not consider the mechanical property of vehicles, and a
vehicle may not move according to the path a visual SLAM suggests. For example, a
vehicle may fail to follow the trajectory because of its angular limit on the steering
wheel. Scale estimation is another problem that needs to be resolved when a monocular
camera is used for visual SLAM. It is because the scale ratio between the map and
the physical world is essential to indicate the exact distance required to create
a parking path.
This research presents a visual SLAM-based control of vehicles in AVP. We suggest
a waypoint-based control of vehicles, and the locations of keyframes serve as waypoints.
The position of keyframes may be unreachable under certain conditions. Hence, unreachable
areas are estimated first to handle them further. Subsequently, the vehicle is controlled
not to locate the target keyframe in the unreachable areas while following the trajectory.
Further, a scale line whose length is known in advance is used to estimate the scale
ratio. The scale ratio can be obtained by comparing the coordinate displacement of
the vehicle on the map with the length of the scale line. However, the scale of the
map could change if the errors of the generated map information are accumulated. Hence,
the scale is updated continuously to prevent error accumulation.
This paper is organized as follows. Section 2 introduces the related works on visual
SLAM and its application in autonomous driving systems. In section 3, the detailed
design of the proposed system is described. We first present the overview of the AVP
system, which includes the proposed solution. After that, keyframe-based control of
vehicles, scale ratio estimation between SLAM map and the physical world, and other
modules are presented. Subsequently, in section 4, we show several experiment results,
and we conclude in section 5.
2. Related Work
There are a lot of research works related to visual SLAM reported in the literature.
Therefore, we categorized them into general visual SLAM and visual SLAM for autonomous
driving according to the environment and purposes.
2.1 General Visual SLAM
Most visual SLAM methods extract visual features from images and use the features
described by several models like FAST, SIFT, SURF, BRIEF, and ORB. Pixels that correspond
to visual features are used for map point generation by two-view triangulation [11]. By tracking previously extracted features in the current frame, visual SLAMs estimate
the current camera pose. Among various image features, ORB is combined with a FAST
detector and a BRIEF descriptor and is computationally cheaper and more efficient
than traditional features like SIFT and SURF [12]. ORB SLAM is developed to enable mapping and localization using these advantages
of the ORB feature [13] and is the most widely used technology in autonomous driving research.
Deep learning-based features have been suggested recently [14], and they report better performance in coping with perspective distortion. However,
the use of these features requires the utilization of GPUs. End-to-end SLAM techniques
have also been suggested [15-17]; however, their performance is limited thus far [18].
2.2 Visual SLAM for Autonomous Driving
Fisheye cameras are usually used for ADAS to minimize the number of cameras while
maximizing the field of view around a vehicle. However, fisheye cameras have a greater
radial distortion instead of having a wider angle of view. To handle radial distortion,
Cubemap SLAM has been suggested [19]. It uses the same pipeline of ORB SLAM except that it projects a 2D image onto a
cube in a 3D space, and then the planar figure of the cube map is used to extract
the features.
Certain works reported in the literature [20,21] detect features of objects observed in driving environments such as road signs to
enhance the accuracy of visual SLAM for autonomous driving. An AVP-SLAM has also been
suggested and performs well, especially in parking lots. It segments areas with the
characteristics of the parking lot environment, such as speed bumps, parking lines,
and kerb with the help of a neural network. However, it utilizes extra sensors (ultrasonic
sensors and wheel-encoder) to increase localization accuracy. Furthermore, the pre-mentioned
methods cannot be considered a complete AVP system because they have no visual SLAM-based
vehicle control and mandatory functions like path planning.
3. The Proposed Scheme
3.1 System Overview
Fig. 1 illustrates the overview of the proposed valet parking system. The system is largely
divided into four parts: road sign-based vehicle control, keyframe-based vehicle control,
autonomous parking, and return driving. In the road sign-based vehicle control stage,
the vehicle is basically controlled by straight driving, and the road signs such as
arrows on the ground of the parking lot are recognized to control the steering. In
this way, the vehicle circles the parking lot, and the SLAM is operated to generate
a map of the parking lot. When the vehicle returns to the starting position while
driving inside the parking lot, loop closing to minimize the accumulated error of
the map is performed. From this moment, the vehicle is not controlled by road signs
recognition but rather controlled by keyframes created in the SLAM map. Furthermore,
autonomous parking is performed if an empty parking bay is detected in keyframe-based
driving. Finally, when the user calls the vehicle after it is parked, it returns to
the user called position.
The proposed system simultaneously performs autonomous driving, scale ratio calculation,
and parking bay detection. The scale ratio is used when an actual distance in the
physical world needs to be represented in the SLAM map. Also, parking bay detection
based on line detection is performed and is used in the parking procedure with the
scale ratio.
The proposed system uses four camera sensors attached in four directions (front, rear,
left, and right). Visual SLAM is operated using the front camera. The other sensors
are used to detect empty parking bays and execute the parking algorithm accordingly.
Also, the vehicle drives at a constant speed, and it is assumed that the length of
the scale line is known in advance. Finally, a LUT (Lookup Table) effectively uses
the AVM (Around View Monitor) technique to correct radial distortions in fisheye images.
In other words, not the entire process of the AVM system is used. But, coordinate
relationships between the images with and without distortion correction are used to
reduce the computational cost.
3.2 Keyframe-based Vehicle Control System
The proposed system performs complete autonomous driving throughout the entire process,
and the way of driving varies depending on the sufficiency of map data. If the information
generated by SLAM is enough to understand the structure of the parking lot, it is
used to drive the vehicle. Before that, the vehicle must be driven using other information.
This paper sets the occurrence of loop closing as the criteria for the sufficiency
of map data to distinguish these two stages.
After loop closing occurs, autonomous driving is carried out using keyframe data in
the SLAM map. Keyframes represent the position of the camera when the input frame
can be distinguished from others while the map is generated. Therefore, following
the keyframes in the order in which they are created can drive the trajectory passed
by before loop closing.
Autonomous driving using this method is performed in 4 steps.
1) Sort keyframes in order according to the timing they are created.
2) Set one target keyframe at which the vehicle should arrive. At this time, the target
keyframe must be located ahead of the vehicle because the vehicle must drive forward.
Therefore, the closest keyframe is chosen as the target keyframe among those located
ahead of the vehicle.
3) Calculate the coordinate of the target keyframe with respect to the vehicle coordinate
system.
4) Control the steering of the vehicle so that it can be driven towards the target
keyframe.
The fourth step is repeated until the vehicle reaches the target keyframe, and then
the second to the fourth steps are repeated until an empty parking bay is found.
The steering when the keyframe is created must also be considered to follow the trajectory
identically. But, on the other hand, the pre-mentioned method only follows the position
of the keyframes. Therefore, there is a limitation that it may be impossible to drive
along the trajectory exactly. Hence, we designed an exception handling that allows
the vehicle to correct its location when it is out of the trajectory.
For exception handling, the vehicle’s radius of rotation must be estimated. And the
unreachable areas are to be established from the estimated radius of rotation. As
shown in Fig. 2, if the vehicle’s radius of rotation is r, the interior area of the circles with
the radius r centered on (-r, 0) and (r, 0) in the vehicle coordinate system are where
the vehicle cannot reach at once. Therefore, the minimum value among the radii of
spheres passing through all three adjacent keyframes can estimate the radius of rotation.
However, the y-axis in the SLAM map represents the height and does not significantly
affect the vehicle’s direction. Therefore, the coordinates of all keyframes are projected
onto the xz-plane, and the minimum value among the radii of the circles passing the
three adjacent coordinates is set as the vehicle’s radius of rotation. The center
of the circle passing through the three points (x1,z1), (x2,z2), and (x3,z3) can be
obtained from Eq. (1), and the radius can be obtained from Eq. (2).
If the target keyframe is in unreachable areas, the vehicle is controlled to correct
its location. In this case, the vehicle reverses in the opposite direction of the
target keyframe’s position with respect to the vehicle’s coordinate system. For example,
if the target keyframe is unreachable and is in front of the left side of the vehicle,
then the vehicle reverses to the right rear. This correction process is only carried
out until the keyframe is out of the unreachable areas, and the vehicle follows the
trajectory again after that.
Fig. 2. Areas that the vehicle cannot reach at once.
3.3 Scale Ratio Estimation
The initial scale between the SLAM map and the physical world is given at the beginning
of the system. This scale is used for all processes using the SLAM map. The vehicle
continues driving, knowing the scale ratio between the SLAM map and the real world.
Scale update is the process of constantly modifying the relational expression between
the actual length and the SLAM coordinates because of scale changes in the SLAM map.
SLAM map is created based on the coordinates of matched feature points. If the errors
in feature point matching accumulate, the scale of the SLAM map may change. As a result,
the initial scale value may be unsuitable for the current SLAM map.
First of all, the time during which the scale line is detected is measured when the
initial scale is obtained and to perform scale updates. Then, when a scale update
is needed, the vehicle’s displacement on the SLAM map over the measured time is calculated
again. Since accurate scale calculation requires the displacement of the vehicle traveling
at the same speed, the above process was performed only during straight driving. Curved
driving poses difficulties in calculating the distance of vehicles traveling.
The new scale is calculated using the distance of the vehicle traveling on the newly
obtained SLAM map and the distance of the vehicle traveling when calculating the initial
scale. A newly updated scale can be calculated as per Eq. (3), which allows the use of the scale that fits the SLAM map even if there is an error
3.4 Other Modules
Road signs recognition and parking lot detection were assumed to be performed in AVM
images. However, since the cameras attached to the vehicle were fisheye cameras, it
was impossible to apply algorithms to detect edges and lines for parking bay detection
due to severe radial distortion. Therefore, we corrected the distortion and merged
it into the aerial AVM image to identify the parking bay. However, the merging process
has a high computational cost and is unsuitable for the real-time parking system.
Hence, the proposed system employed a LUT, which contains mapping relationships between
AVM images and fisheye images. After storing the ROI of the AVM image that needs to
be calculated in advance of the LUT, the system performed parking line detection by
approaching the pixel value corresponding to the coordinates of the fisheye image
in the LUT. In other words, real-time performance is guaranteed by significantly reducing
processing time as the mapping information is stored in the LUT and is read-only when
necessary.
The actual parking process is implemented in five steps. The first step locates the
vacant parking bay after
positively identifying it through the pre-mentioned process. In this step, the steering
angle of the wheels is changed in the opposite direction of the parking bay. The vehicle
is then driven forward a certain distance depending on the type of vehicle. The vehicle
is driven while comparing the vehicle’s current location with the destination indicated
after mapping to keep the vehicle within a certain distance. The third parking step
is carried out after the vehicle has moved to a destination a certain distance away.
In this step, the vehicle’s wheels are steered in the direction of the parking bay,
and the vehicle is then reversed. If the vehicle is parallel to the parking line while
reversing, the vehicle’s steering angle is restored to be straight (i.e., parallel
to the parking line as well). The fisheye cameras attached to the left and right of
the vehicle are used to determine whether the vehicle is parallel to the parking line.
The slope of the detected lines is obtained in advance when the vehicle is parallel
to the parking line. If the two slopes are identical, the vehicle is considered to
be parallel to the parking line. If the vehicle reverses parallel to the parking line,
it starts to determine in an intelligent manner if it has entered the parking bay
exactly. The rear camera continuously compares the gap between the vehicle and the
line behind the parking bay (i.e., the baseline) by detecting the line continuously.
When this gap decreases below a certain threshold, the vehicle is considered to be
correctly located inside the parking lot and is stopped. The parking algorithm is
then terminated.
4. Experiments
4.1 Implementation Environment
The environment used to implement the proposed system comprised an Intel (R) Core
(TM) i7-7700HQ CPU, 8GB RAM, and Ubuntu 16.04 OS. The experiment was conducted indoors
and outdoors to simulate an actual parking lot. Additionally, the road signs and parking
bays were reduced in proportion according to the vehicle. Furthermore, the distance
used in the parking algorithm was also calculated and adjusted to the rotation of
the HENES T870 model. Finally, a nucleo board was used to send vehicle control signals.
4.2 Keyframe-based Autonomous Driving
Two errors, namely translation and rotation errors, need to be estimated to measure
the accuracy of the control algorithm used in keyframe-based autonomous driving. They
were measured when the vehicle returned to its original position after loop closing.
Specifically, translation error was estimated by measuring the position difference,
and rotation error was estimated by measuring the angle difference between the initial
and returned driving. The driving is successful if the translation error is less than
10 cm and the rotation error is less than 15 degrees.
At the end of the experiment, the translation measurement reported an 80% success,
and the rotation measurement reported an 86.67% success. The experiment was performed
fifteen times at three locations shown in Fig. 3, and the result is shown in Table 1.
The experiment was conducted in indoor and outdoor environments of various sizes and
showed a high accuracy and success rate. In comparison, most autonomous driving methods
for autonomous valet parking systems mainly use infrastructure that costs a lot. For
example, Huang et al. [22] proposed autonomous driving while detecting infrastructure such as a mark on the
wall to follow the route that vehicle has already traveled to achieve high accuracy.
The results in Table 1 show that similar results can be produced without using any infrastructure for driving
route creation and tracking, even though there is a slight difference in performance
between these two works. This comparison means that using only SLAM data for autonomous
driving works without a significant error. Therefore, the experimental results above
mean that keyframe-based vehicle control can be applied in AVP, which repeats driving
within the same environment.
Fig. 3. Locations for the keyframe-based autonomous driving experiment: (a) Narrow indoor space with light reflection on the floor; (b) Narrow outdoor space; (c) Large outdoor space.
Table 1. Result of the keyframe-based autonomous driving experiment.}
Movement
|
Success Rate (%)
|
Translation
|
80
|
Rotation
|
86.67
|
4.3 Parking
In this experiment, the success rate and accuracy of the parking algorithm were measured.
The experiments performed fifteen times were conducted indoors with light reflections
on the ground and outdoors with asphalt construction, a typical parking lot environment,
to check the robustness of parking line detection. Also, the parking bay was placed
only on the left side of the vehicle for repetitive experiments under the same conditions.
If the distance of the nearest side parking line from the vehicle is less than 18
cm, it was considered a successful case, as shown in Fig. 4. The value of 18 cm was estimated by considering the size of the parking bay and
the vehicle. As shown in Table 2, the indoor experiment reported a success rate of about 66%, and the outdoor experiment
reported a success rate of about 73%. Parking line detection and template matching
were performed normally in both environments. However, if the vehicle could not move
a certain distance exactly during the implementation of the parking algorithm, it
would be incorrectly parked, as shown in Fig. 5. This problem appears to be caused by the incorrect scale ratio calculation between
the map and the physical world. The other processes like parallelity and completeness
of parking were performed normally.
Fig. 4. Result of a successful parking.
Fig. 5. Results showing parking failures.
Table 2. Result of the autonomous parking experiment.}
Environment
|
Success Rate (%)
|
Indoor
|
66.67
|
Outdoor
|
73.33
|
5. Conclusion
This research proposes a vehicle control method for a fully autonomous valet parking
system. The proposed method is based on visual SLAM, and a vehicle is driven by tracing
the trajectory using keyframes in the SLAM map. Also, we proposed to estimate the
scale ratio between the SLAM map and the physical world by introducing a scale line
and consistent update.
As future work, scale estimation without auxiliary scale lines will be studied. Furthermore,
as loop closing is not always guaranteed, visual SLAM-based control without loop closing
will also be studied. Lastly, the proposed system will be tested on the real car for
commercialization.
ACKNOWLEDGMENTS
This work was supported by Korea Evaluation Institute of Industrial Technology
(KEIT) grant funded by the Korea government (Ministry of Trade, Industry and Energy)
in 2020 (No.20009775, Development of AI based Around View Monitoring SoC for Automated
Valet Parking)
REFERENCES
Khalid M., Wang K., Aslam N., Cao Y., Ahmad N., Khan M.K., February 2021, From smart
parking towards autonomous valet parking: A survey, challenges and future Works,,
Journal of Network and Computer Applications, Vol. 175
Han S. J., Choi J., December 2015, Parking Space Recognition for Autonomous Valet
Parking Using Height and Salient-Line Probability Maps, ETRI Journal, Vol. 37, No.
6, pp. 1220-1230
Yin H., Wang Y., Tang L., Ding X., Huang S., Xiong R., February 2021, 3D LiDAR Map
Compression for Efficient Localization on Resource Constrained Vehicles, IEEE Transactions
on Intelligent Transportation Systems, Vol. 22, No. 2, pp. 837-852
Zhang J., Singh S., July 2014., LOAM: Lidar Odometry and Mapping in Real-time, Robotics:
Science and Systems, Vol. 2, No. 9
Agrawal M., Konolige K., 2006, Real-time localization in outdoor environments using
stereo vision and inexpensive gps, 18th International Conference on Pattern Recognition
(ICPR’06), pp. 1063-1068
Tseng P. K., Hung M. H., Yu P. K., Chang S. W., Wang T. W., September 2014, Implementation
of an autonomous parking system in a parking lot, 2014 world congress on intelligent
transport systems
Luca R., Troester F., Gall R., Simon C., January 2010, Autonomous Parking Procedures
Using Ultrasonic Sensors, Annals of DAAAM & Proceedings
Kummerle R., Hahnel D., Dolgov D., Thrun S., Burgard W., 2009, Autonomous driving
in a multi-level parking structure, 2009 IEEE International Conference on Robotics
and Automation, pp. 3395-3400
Song J., Zhang W., Wu X., Cao H., Gao Q., Luo S., September 2019, Laser-based SLAM
automatic parallel parking path planning and tracking for passenger vehicle, IET Intelligent
Transport Systems, Vol. 13, No. 10, pp. 1557-1568
Qin T., Chen T., Chen Y., Su Q., 2020, Avp-slam: Semantic visual mapping and localization
for autonomous vehicles in the parking lot, 2020 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), pp. 5939-5945
Hartley R., Zisserman A., 2004, Multiple View Geometry in Computer Vision, Cambridge
University Press, pp. 237-262
Rublee E., Rabaud V., Konolige K., Bradski G., 2011, ORB: An efficient alternative
to SIFT or SURF, 2011 International Conference on Computer Vision, pp. 2564-2571
Mur-Artal R., Montiel J. M. M., Tard\'{o}s J. D., October 2015, ORB SLAM: a versatile
and accurate monocular SLAM system, IEEE Transactions on Robotics, Vol. 31, No. 5,
pp. 1147-1163
Li H., Xiong P., Fan H., Sun J., 2019, Dfanet: Deep feature aggregation for real-time
semantic segmentation, Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 9522-9531
Iyer G., Krishna Murthy J., Gupta G., Krishna M., Paull L., 2018, Geometric consistency
for self-supervised end-to end visual odometry, Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Workshops, pp. 267-275
Zhan H., Garg R., Weerasekera C. S., Li K., Agarwal H., Reid I., 2018, Unsupervised
learning of monocular depth estimation and visual odometry with deep feature reconstruction,
Proceedings of the IEEE conference on computer vision and pattern recognition, pp.
340-349
Sheng L., Xu D., Ouyang W., Wang X., 2019, Unsupervised collaborative learning of
keyframe detection and visual odometry towards monocular deep slam, Proceedings of
the IEEE/CVF International Conference on Computer Vision, pp. 4302-4311
Zhan H., Weerasekera C. S., Bian J. W., Reid I., May 2020, Visual odometry revisited:
What should be learnt?, 2020 IEEE International Conference on Robotics and Automation
(ICRA), pp. 4203-4210
Wang Y., Cai S., Li S. J., Liu Y., Guo Y., Li T., Cheng M. M., December 2018, Cubemap
slam: A piecewise-pinhole monocular fisheye slam system, Asian Conference on Computer
Vision, pp. 34-49
Schreiber M., Knöppel C., Franke U., June 2013, Laneloc: Lane marking based localization
using highly accurate maps, 2013 IEEE Intelligent Vehicles Symposium (IV), pp. 449-454
Ranganathan A., Ilstrup D., Wu T., November 2013, Light-weight localization for vehicles
using road markings, 2013 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 921-927
Huang Y., Zhao J., He X., Zhang S., Feng T., 2018, Vision-based Semantic Mapping and
Localization for Autonomous Indoor Parking, IEEE Intelligent Vehicles Symposium (IV),
pp. 636-641
Author
Young Gon Jo received his B.S. degree in computer science and electrical engineering
from Handong University in 2021. His research interest includes image processing,
SLAM, and autonomous driving technology.
Seok Hyeon Hong received his B.S. degree in computer science and engineering from
Handong University in 2021. His research interest includes computer graphics and computer
vision.
Jeong Mok Ha received his B.S. degree in electrical engineering from Pusan National
University in 2010 and a Ph.D. degree from Pohang University of Science and Technology
(POSTECH) in 2017. He is interested in automotive vision, including camera calibration,
surround view, deep learning, and SLAM.
Sung Soo Hwang received his B.S. degree in electrical engineering and computer
science from Handong University in 2008 and M.S. and Ph.D. degrees from Korea Advanced
Institute of Science and Technology in 2010 and 2015, respectively. His research interest
includes the creation and operation of video maps for augmented reality and autonomous
driving.