KimSeungbin1
RodiHartono1
KalendTshibang Patrick A1
BaikNamkyun2*
ShinKyooJae1
-
(Department of Artificial Intelligence Convergence, Busan University of Foreign Studies
/ Busan, Korea
skbbq123@naver.com
)
-
(Department of Cyber Security, Duksung Women's University / Seoul, Korea namkyun@duksung.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Android, Object recognition, SSD, YOLO, Visually impaired, Real-time
1. Introduction
Object recognition is the computer recognition of the objects in images captured in
real-world scenes. Object recognition is a compound word of classification and localization,
meaning the classification of objects and the identification of location information
simultaneously. Object recognition technology is widely applied in medical, autonomous
driving, and military fields and research on it is being actively conducted. Until
recently, visually impaired people have suffered considerable inconvenience in their
daily life, and accidents are frequent. Therefore, it is time to find an alternative
solution to these problems. This paper proposes a dangerous object recognition application
to solve this problem through image recognition using a mobile phone. Joseph Redmon
et al. proposed the you only look once (YOLO) algorithm for a one-step method of these
object recognition algorithms [1]. Wei Liu et al. studied the Single Shot Multi-Box Detector algorithm (SSD) [2]. Regarding the two-step method, Shaoqing Ren et al. researched the faster region-based
convolutional neural network (R-CNN) algorithm [3]. On the other hand, the faster R-CNN method is unsuitable for real-time object detection
because it is a two-step method, and its speed is relatively slow. Sanika Dosi et
al. applied object recognition to mobile phones [4]. In addition, Sumitra A. Jakhet et al. studied object detection applications for
the visually impaired [5]. Object recognition networks are generally divided into a one-step method and a two-step
method. In the one-stage method, localization and classification are performed simultaneously,
whereas they are performed separately in the two-stage method. The two-step method
is more accurate but slower because it takes one more step than the one-step method.
This paper proposes a system that recognizes objects in front of the user with a camera
and informs them in real-time using a smartphone for the visually impaired. A mobile-based
real-time dangerous object recognition system is proposed using an SSD network using
a first-stage detector suitable for real-time object recognition among two representative
image object recognition methods.
2. Design of Risk Object Recognition System for the Visually Impaired
This paper presents a system for real-time risk object recognition for the visually
impaired. For the visually impaired, even objects that are not hazardous to non-disabled
people may be at risk. Therefore, this study designed a mobile application that recognizes
objects in everyday life, identifies the objects in front of them, and informs the
visually impaired. Mobile phones are easy to use and light, which makes them very
convenient for users. Using this point, an application for the visually impaired on
a mobile phone can help the visually impaired to solve the inconvenience of life.
The system proposed in this paper is based on dangerous object recognition.
After training the dataset in advance, it was converted to a pb-file suitable for
Android applications. Subsequently, it produces an Android application to which the
model is applied. When a specific object is detected [6], it notifies the user of what kind of object is in front of them by sounding an alarm.
The highest priority class, i.e., the object, is processed first using the priority
queue among the recognized objects. In this structure, the user is notified of the
recognized object according to the priority. The user can hear the voice and know
the object in front.
Fig. 2 shows the structure of the one-stage detector. As mentioned earlier, unlike the two-stage
method in which Regional Proposal and Classification are performed sequentially, it
is performed simultaneously in the one-stage method. Therefore, it is relatively fast,
which is much more advantageous for real-time object detection [7]. Recently, as the accuracy of the one-stage detector is improved, it has an accuracy
close to that of the two-stage detector.
Fig. 1. Overall Object Recognition System.
Fig. 2. 1-Stage Detector.
3. Object Recognition Network Training
Although various networks exist for object detection, in this paper, a suitable one-stage
detector is selected for real-time object detection. This paper considers the representative
SSD and YOLO networks among the one-stage detector methods. YOLO and SSD networks
have the advantage of high speed, so they are suitable for real-time object detection.
On the other hand, the YOLO network has difficulty distinguishing overlapping objects,
and the SSD network is more difficult to use than YOLO. In this study, a real-time
object recognition system [8] was designed by selecting an SSD network.
3.1 SSD Network
The SSD model is a one-stage detector like YOLO. The entire network uses the pretrained
VGG16 as a base. After that, it has a structure to add a secondary network. The secondary
network is composed of a general convolutional network connected to the secondary
network by changing the fully connected layer to a convolutional network. The detection
speed is improved in this process, and the overall network is similar to a general
convolutional network.
Fig. 3 presents the rough structure of the SSD network. The main idea of the SSD model is
to use feature maps of various scales. The existing model used only feature maps of
the same size. When a feature map of a constant scale is used, it may be difficult
to detect objects of various sizes. The SSD model is used for detection by extracting
feature maps of convolutional layers in the middle.
3.2 Loss Function
The loss function of the SSD model consists of the sum of the confidence loss and
the localization loss. ${\alpha}$ is a parameter that adjusts the weight between the
two losses, and ${\alpha}$ = 1 is used. N is the number of matched default boxes.
If N=0, the loss becomes 0. The complete formula is shown in Eq. (1).
Like the Faster R-CNN model, the localization loss is obtained through smooth L1 Loss
using the central coordinates, width and height of the default box. The expression
for it is shown in Eq. (2) below.
Confidence loss is calculated through softmax loss for all classes. The formula is
shown in Eq. (3).
3.3 Training SSD
As shown in Fig. 3, the entire network was constructed, and the pre-trained model was produced. The
two fully connected layers were replaced with a convolutional network. Subsequently,
the extra network was designed so that the size of the final output became 1x1. Different
feature maps were then obtained, and the convolutional operation was applied to the
feature maps of different scales. The entire feature map was merged, and the SSD network
was trained through the above loss function.
4. Experimental Result
This paper proposes the application of real-time risk object recognition for the visually
impaired to Android applications. Therefore, identifying an object that can threaten
the visually impaired in daily life is essential. This paper proposes a mobile application
by selecting the most easily conceivable objects.
The four items in Table 1 were selected as they were judged to be the most common dangerous objects that visually
impaired people can encounter outdoors daily.
Table 1. Target Object Table.
Target Object
|
1. Person
|
2. Car
|
3. Bicycle
|
4. Traffic light
|
Table 2. Hardware Specification.
Device
|
LG V30
|
Size
|
151.7 × 75.4 × 7.3mm
|
weight
|
158g
|
Rear camera
|
16MP
|
GPS
|
GPS, GLONASS, GALILEO
|
Sensor
|
Fingerprint, acceleration, and proximity
|
battery
|
3300mAh
|
USB port
|
3.1, Type-C 1.0
|
4.1 Hardware, Software Configuration
The hardware used in this experiment was an LG V30 smartphone. The specifications
of the proposed hardware are as follows. The experiment results may change depending
on the camera performance of the hardware.
As shown in the table above, the camera used for object recognition has a specification
of 16 megapixels. It is lightweight, even if the visually impaired use it only for
object recognition. In addition, there are many possibilities that it can be used
in more advanced parts using built-in sensors and GPS. When developing an Android
application, it is connected to a computer using a USB port. Application design and
creation were carried out through Android Studio. In addition, a function to give
an alarm to the user when the highest priority class is a specific object using a
priority queue has been added.
Fig. 4 shows the code that executes the sound file only when the class is 'person' among
the objects recognized by the smartphone. By varying the sound according to a specific
object, it can recognize what kind of object it is by sound.
The dataset used for object recognition can recognize approximately 90 types of objects
using the coco dataset. Among them, a dangerous object is selected, and an alarm function
is added separately. The system provides sound by prioritizing high-priority objects
and can be adjusted easily in Android Studio according to convenience.
Fig. 4. Add Sound Code on Android Studio.
4.2 Experimental Result
This paper proposes a notification system through dangerous object recognition for
the visually impaired. Four objects were selected as experiment conditions: person,
car, bicycle, and traffic light. In the experiment method, the above four objects
were photographed randomly, and the recognition rate was calculated based on the total
number of recognition and notification success times. The formula for calculating
the recognition rate is as follows (4).
Because this paper focuses on the visually impaired, eventually converting to sound
is also essential. Therefore, the number of recognized objects reflected in the recognition
rate in Eq. (4) is the number of times the user hears the correct notification. Real-time object
recognition was carried out by shooting based on the designed application.
Fig. 5 above is a picture captured while shooting with a smartphone to which the object
recognition system is applied. In addition to filming in real life, additional photographs
and videos were used. For all four items, object recognition was tested approximately
50 times. Table 3 lists the results of the recognition rate derived through Eq. (4).
Each of the four types of the experiment was performed 50 times, and the recognition
rate was 96% for Person, 96% for Car, 98% for Bicycle, and 94% for Traffic light.
On the other hand, these results may vary depending on the degree of light reflection
or weather.
Fig. 5. Object Recognition Screen.
Table 3. Recognition Rate.
Object
|
Recognition Rate (%)
|
Person
|
96
|
Car
|
96
|
Bicycle
|
98
|
Traffic light
|
94
|
5. Conclusion
This paper proposed an application for real-time dangerous object detection for the
visually impaired. Among the one-stage and two-stage detector methods, a one-stage
detector more suitable for real-time object detection was selected, and a model trained
through an SSD network was used for real-time object detection. Even a trivial object
for the visually impaired can be considered a dangerous object in life. Therefore,
this paper proposes a system that detects objects by selecting four objects: people,
cars, bicycles, and traffic lights. The system notifies the visually impaired person
with a sound when the camera detects the object. After applying the learned model
to the application, experiments were performed approximately 50 times for each item.
The experiment was for the visually impaired, so it was filmed using a camera in everyday
life. The recognition rate for the experiment was calculated as the number of times
the application made a sound after object detection. The experiment results were 96%
for people, 96% for cars, 98% for bicycles, and 94% for traffic lights. In view of
these results, the proposed system can guarantee a safer life for the visually impaired
who have difficulty in life and can further improve the quality of life. In the future,
more in-depth studies are needed to detect objects in environments with poor shooting
conditions, such as light reflection and weather. If further research is conducted,
a system useful for non-disabled people will come out when it is necessary to secure
a dark space or a field of vision.
ACKNOWLEDGMENTS
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under
the ICAN(ICT Challenge and Advanced Network of HRD) program(IITP-2022-2020-0-01825)
supervised by the IITP(Institute of Information & Communications Technology Planning
& Evaluation). This research was partly supported by Institute for Information & communications
Technology Promotion(IITP) grant funded by the Korea Government(MSIT) and Korea Institute
for Advancement of Technology(KIAT) grant funded by the Korea Government(MOTIE) (P0008703,
The Competency Development Program for Industry Specialist).
REFERENCES
J. Redmon, et al., "You Only Look Once: Unified, Real-Time Object Detection," 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV,
USA, 2016, pp. 779-788, 2016.
Liu, W., et al., ``SSD: Single Shot MultiBox Detector,'' Computer Vision \textendash{}
ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9905. Springer, Cham.
Shaoqing Ren, et al., ``Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks'' IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 39, no. 6, pp. 1137-1149, 1 June 2017.
Sanika Dosi et al., ``Android Application for Object Recognition using Neural Networks
for the Visually Impaired,'' 2018 Fourth International Conference on Computing Communication
Control and Automation (ICCUBEA), Pune, India, pp. 1-6, 2018.
Sumitra A. Jakhet et al., ``Object Recognition App for Visually Impaired'' 2019 IEEE
Pune Section International Conference (PuneCon), Pune, India, pp. 1-4, 2019.
William Tarimo et al., ``Real-Time Deep Learning-Based Object Detection Framework,''
2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia,
pp. 1829-1836, 2020.
P.Devaki et al., ``Real-Time Object Detection using Deep Learning and Open CV,'' International
Journal of Innovative Technology and Exploring Engineering (IJITEE)ISSN: 2278-3075,
Volume-8 Issue-12S, October 2019.
Ignacio Martinez-Alpiste et al., ``Smartphone-based real-time object recognition architecture
for portable and constrained systems'' J Real-Time Image Proc19, 103-115, 2022.
Author
Seung Bin Kim received his bachelor's degree from Busan University of Foreign Studies,
Republic of Korea, in 2021. Since 2021, he has been pursuing his master's program
at the Department of Artificial Intelligence Convergence, Busan University of Foreign
Studies. He has a passionate interest in Artificial Intelligence Convergence, Image
Recognition, Deep Learning, and IoT.
Rodi Hartono is an electrical engineerwith a strong interest and ability in Artificial
Intelligence Convergence, Control Systems, Robotics, Image Recognition, Deep Learning,
and Automation Systems. He has worked in the robotics and intelligent systems field
for 12+ years as a lecturer, full-time researcher, and leader responsible for managing
and supervising a UNIKOM robotics team to research and build robots to compute regions
and national and international robot competitions. In addition, his team is also responsible
for designing, researching, building, and making robotics products for the industrial
community in the robotics division laboratory, Indonesian Computer University (UNIKOM).
He received his bachelor's degree from Indonesian Computer University, Indonesia,
in 2010. In 2014, he received his Master's degree holder at the School of Electrical
and Informatics Engineering, Bandung Institute of Technology (ITB), Indonesia. Since
2021, he has been pursuing his Ph.D. program at the Department of Artificial Intelligence
Convergence, Busan University of Foreign Studies, Republic of Korea.
Tshibang Patrick a Kalend
Tshibang Patrick a Kalend, received his bachelor’s degree in Computer Science Engineering,
specifically in Information System Engineering, from Université Protestante de Lubumbashi
, Democratic Republic of Congo, in 2016. After his graduation, he provided lectures
and led students in their projects for graduation. In 2022, he started his master’s
program at the Department of Artificial Intelligence Convergence, Busan University
of Foreign Studies. He is passionate about Artificial Intelligence, Computer vision,
robotics, and IoT.
Nam Kyun Baik is a Professor in Department of Cyber Security at Duksung Women’s
University. He received his B.S., M.S., and Ph.D. degrees in the School of Electronic
Engineering from Soongsil University. From 2000–2017, he was a senior researcher at
Korea Internet & Security Agency. He has a passionate interest in Convergence Security.
Security Consulting, Information Security Management Systems, AI Security, and IoT
Security.
Kyoo Jae Shin is a Professor of Intelligence Robot Science at the Busan University
of Foreign Studies (BUFS), Busan and South Korea. He is the director of Future Creative
Science Research Institute at the BUFS. He received his B.S. degree in Electronics
Engineering in 1985 and M.S degree in Electrical Engineering from Cheonbuk National
University (CNU) in 1988 and his Ph.D. degree in the Electrical Science from the Pusan
National University (PNU) in 2009. Dr. Shin was a professor of Navy technical education
school and a main director for research associate of Dynamic stabilization system
in the Dusan defense weapon research institute. He researched and developed the following:
fish robot, submarine robot, automatic dug spay robot in a glass room, milking automatic
robot using manipulator, personal electrical vehicle, smart accumulated aquarium using
heat pump, solar tracking system, 3D hologram system and gun/turret stabilization
system. He has interested in intelligence robots, image signal processing application
system, and smart farm and aquarium using new energy and IoT technology.