A Study on the Improvement of Object Detection Performance by Infrared Data Augmentation
based on Diffusion Models
ParkSeonghyun1
LeeTaeyoung1
AhnJongsik1
KimHaemoon1
KimHyunhak1
KimSeoyoung1
ChoiByungin*
-
(Intelligence Software Team, Hanwha Systems Co., Ltd., Seongnam-si, Gyeonggi-do 13524,
Korea
{seonghyun, ty.lee, jongsik.ahn, haemoon1205, kim.hyun.hak95, seoyoung.kim,
byungin.choi}@hanwha.com
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Infrared image, Object detection, Data augmentation, Generation model, Diffusion model
1. Introduction
Object detection is an image processing technique used to classify the object type
while determining the location and size of objects in the form of a bounding box.
Object detection has been widely used in various industrial fields, such as autonomous
driving, surveillance-security systems, and robotics. In particular, autonomous driving
and security systems require high accuracy and diverse training data, considering
various environmental conditions to minimize errors caused by false detection.
However, visible light images or visible image in short with insufficient information
due to low-light environments (i.e. fog, rain, darkness) increases object detection
error. Fig. 1 shows the examples of the object detection performance on visible images and infrared
images from the same viewpoint. Visible images can capture semantic information representing
the background and objects, but their quality is easily affected by weather and illumination
conditions. In contrast, infrared images capture the thermal radiation emitted from
objects, thereby overcoming environmental limitations. Therefore, infrared images
can be used in various image processing algorithms, such as object detection and tracking,
which are increasingly essential in night vision and surveillance systems [1]. However, infrared sensors are expensive and have limited capture conditions because
thermal factors should be considered in various environments. In addition, the complexity
of infrared image processing is high because of sensor degradation and the status
of temperature inversion between the background and object. Thus, there are only a
few public datasets based on infrared images. Only two of the 244 publicly available
object detection datasets are infrared image datasets [2]. Consequently, infrared image datasets are used for specialized fields such as defense
and medicine, making them difficult to train because of their low versatility and
the lack of images [3].
Fig. 1. An example of object detection performance for visible and infrared image pairs.
Data augmentation techniques have been introduced to overcome the lack of image datasets
and to ensure model performance. These techniques increase the diversity of image
datasets in order to alleviate data bias, prevent overfitting, and improve model performance.
However, although these image processing algorithms achieve performance improvements,
they are still inadequate to meliorate the absolute performance due to the limitation
of inherent constraints in the original dataset or the characteristics of the infrared
images. To overcome the limitation, many research studies have been proposed image
generation methods, such as Image-to-Image translation algorithms, to fill the lack
of training data [4]. Additionally, inspiration from the probabilistic approach overcoming the imbalance
of the generator and discriminator of generative adversarial networks (GANs) has developed
several diffusion models; leading the models to estimate the sample distributions
according to the probability distribution of the images. With the ability to specify
clear goals, such as distribution range and fixed training objectives, diffusion models
show superior performance for data augmentation [5].
In this study, we analyze the object detection performance by infrared data augmentation
based on diffusion models according to the number of images, classes, and object size.
First, we translate infrared images from visible images using diffusion models. Then,
we trained the object detection model by constructing the translated infrared image
training dataset at various ratios.
Finally, we analyze the performance of each object detection model trained on datasets
with various mixed ratios ${\lambda}$ and the factors that accordingly affect object
detection performance. Furthermore, we show the effectiveness of the relationship
between the distribution of real infrared images and translated infrared images from
visible images in the quantitative assessment of object detection.
The remainder of this paper is organized as follows: Section II briefly reviews data
augmentation techniques, Section III describes the infrared image augmentation techniques
using diffusion models. Section IV discusses the experimental results, and Section
V, finally, concludes the paper.
Table 1. Classification of image data augmentations.
Image processing-based approaches
|
Color Space Transformation,
Geometric Transformation,
Kernel Filter
|
Learning-based
approaches
|
Adversarial Training, Neural Style Transfer, GAN (Generative Adversarial Networks),
Diffusion
|
2. Related Work
2.1 Image Processing-based Image Data Augmentation
Data augmentation techniques can be broadly categorized into image processing-based
and learning-based approaches [6], as illustrated in Fig. 2. Image processing-based data augmentation mainly increases the number of images by
transforming characteristics or applying specific filters. For example, color space
transformation is robust against color modifications in the same object because of
its ability to manipulate color information in various ways. Geometric transformation
[7] increases the variability of objects through image resizing, rotation, and displacement,
which improves the robustness to changes in object form. Finally, the kernel filter
[8] emphasizes the features of an image by applying specific filters. However, image
processing-based data augmentation distorts the information in the image during the
transformation process. Moreover, the performance improvement is insufficient for
data lacking semantic information about objects obtained from specialized sensors
such as synthetic-aperture radar (SAR) and infrared.
Fig. 2. Illustration of different architectures for learning-based image data augmentation.
2.2 Learning-based Image Data Augmentation
Learning-based image data augmentation trains specific domain data to generate or
translate into unique style images, such as style transfer and adversarial training.
Fig. 2 compares the three architectures commonly used for learning-based image data augmentation.
Variational Auto-Encoder(VAE) model [9] can generate an image with a distribution similar to the input image using an encoder
and a decoder, as illustrated in Fig. 2(a). However, it generates a low-quality image according to the quality of the input
image. GAN model [10,11] generates images via an adversarial learning process; where the generator generates
images of quality that the discriminator cannot distinguish, and the discriminator
enhances the ability to distinguish between real and generated images, as illustrated
in Fig. 2(b). However, GAN model requires detailed parameter fine-tuning because of the training
instability. Finally, as illustrated in Fig. 2(c), Diffusion model [12] adds noise to the image and gradually removes the noise during training to obtain
data distribution similar to the original, resulting in high-quality images.
3. The Proposed Method
Our approach aims to improve the object detection performance limited by the insufficiency
of infrared image data. We generated insufficient infrared image data based on the
diffusion model, known to be a type of likelihood-based model that generates high-quality
images. We then analyzed the object detection performance by the number of images,
classes, and object size. Fig. 3 shows an overview of the infrared object detection process, which translates infrared
images from visible images and trains an object detection model. Specifically, we
employ the pixel space-based diffusion network Palette [13] and the latent space-based diffusion network BBDM [14] for image-to-image translation. The training datasets were constructed from various
mixed ratios $\lambda $ between ground truth and translated infrared images using
diffusion models. Finally, $\mathrm{mAP}_{0.5}$ is evaluated from trained object detection
models using infrared image datasets with various mixed ratios $\lambda $.
Fig. 3. Overview of infrared object detection process using image data augmentation based on diffusion models.
3.1 Pixel Space-based Palette
Diffusion models consist of a forward process and a reverse process. The forward process
is a Markovian process that iteratively adds Gaussian noise to the original image.
In contrast, the reverse process reconstructs noise into the original image, yielding
significant flexibility and tractability. Palette is a type of pixel space-based diffusion
models that unified the framework for image-to-image translation tasks, colorization,
inpainting, uncropping, and JPEG restoration. The Palette trains a reverse process
to translate an infrared image from a visible image, which inverts the forward process.
Given a noisy infrared image $\overset{˜}{y}$,
the goal is to recover the target infrared image $y_{0}$. Thus, we parameterize network
$f_{\theta }\left(x,\,\,\overset{˜}{y},\,\,\gamma \right)$ using the input visible
image$~ x$, a noisy infrared image $\overset{˜}{y}$, and the current noise level $\gamma
$.
To optimize the loss function, we predict the noise vector $\epsilon $ as follow:
As Palette minimizes the difference at the pixel-level between real images and translated
infrared images by using the spatial dimension, it can effectively translate high-quality
images with the structures and texture details.
3.2 Latent Space-based BBDM
Although Palette faithfully translates infrared images from visible images retraining
detail textures, it has a limitation that generally requires extensive computational
resources due to the use of pixel space. To address this issue, latent space-based
diffusion models have been developed to train with fewer computational resources.
As a latent space-based diffusion model, BBDM can conduct image-to-image translation
between two domains using a stochastic Brownian bridge process , providing promising
results. The encoder extracts the feature maps of the image and maps to the high-dimensional
latent space. Then, diffusion process is progressed based on the schedule of variance
in latent space, then the decoder translates into infrared image. The schedule of
variance for Brownian bridge $\delta _{t}$ diffusion process can be designed as
where $T$ is the total steps of the diffusion process. The sampling diversity can
be tuned by the maximum variance $\delta _{max}$ at the middle step $t=~ \frac{T}{2}.$
To translate infrared images from visible images, object function of BBDM is as follows:
where $x_{0}$ and $y$ denote initial visible image status and infrared image, and
$\epsilon _{\theta }$ is the trained model to estimate ${\epsilon}$. BBDM effectively
represents high-dimensional characteristics in latent space via encoder rather than
pixel space with the diffusion process, thereby improving the learning efficiency
and model generalization.
4. Performance Evaluation
4.1 Experimental Settings
We evaluate infrared object detection performance using the FLIR dataset [15], which contains pairs of well-aligned visible and infrared images from real cameras
concluding over 375,000 annotations. The visible images were translated into infrared
images, and then the dataset was constructed in various ratios with translated images
and FLIR dataset images based on the diffusion model. We use Yolov5l-TA model [16] for object detection, which exhibits superior detection performance and is actively
used in many industry fields because of its outstanding inference speed. The qualitative
metric for the object detection performance calculates the mean Average Precision
($\mathrm{mAP}$) across different Intersection over Union (IoU) thresholds. Furthermore,
the object detection model in different settings was trained to analyze the effectiveness
of the number of images. Table 2 compares the object detection performance. Besides, we categorized the size of the
object based on the pixel scale range of width $W$ and height $H$. Then, the $\mathrm{mAP}$
metrics were classified into $\mathrm{mAP}_{s},$ $\mathrm{mAP}_{m}$, and $\mathrm{mAP}_{l}$
as small, medium, and large, as shown in Table 3. Training with 2,000 and 3,000 images yielded equal results, with a performance of
62.6%. This indicates that the object detection model achieves saturation in terms
of performance; the diversity of the FLIR dataset is sufficient even with 3,000 images.
Therefore, we set the baseline for training the object detection model to 3,000 in
all experiments.
Table 2. The results of object detection performance according to the number of images.
Numbers
|
mAP0.5(%)
|
mAP(%)
|
mAPs(%)
|
mAPm(%)
|
mAPl(%)
|
mAPperson(%)
|
mAPcar(%)
|
1,000
|
61.2
|
34.8
|
21.1
|
60.3
|
71.3
|
52.9
|
69.5
|
2,000
|
62.6
|
35.3
|
21.8
|
59.7
|
72.5
|
55.8
|
69.3
|
3,000
|
62.6
|
35.2
|
21.6
|
59.9
|
72.9
|
55.6
|
69.6
|
Table 3. The categories of object size within the pixel scale range.
4.2 Quantitative & Qualitative Evaluation
We evaluated quantitative image-to-image translation performance based on the diffusion
model using four metrics: Peak Signal-to-noise ratio (PSNR), Structural Similarity
(SSIM), inception score (IS) [17], and Fréchet Inception Distance (FID) [18]. Higher PSNR, SSIM, and IS scores indicate better performance, whereas a lower FID
score implies better performance. Table 4 quantitatively compares performance using pixel space-based Palette and latent space-based
BBDM. BBDM yields higher PSNR and SSIM scores than Palette, indicating that the translated
infrared images have similar structures and characteristics with infrared ground truths.
In contrast, Palette achieves a higher IS score than BBDM, implying that high quality
and superior diversity of infrared images are translated effectively. Furthermore,
The Palette yields a significantly higher FID score than the BBDM. This indicates
that Palette preserves detailed information and effectively translates infrared images
from visible images.
Table 4. Quantitative comparison of the translated infrared image from visible image.
Model
|
PSNR (↑)
|
SSIM (↑)
|
IS (↑)
|
FID (↓)
|
Palette
|
18.92
|
0.4826
|
1.2158
|
53.81
|
BBDM
|
21.88
|
0.4992
|
1.1885
|
59.03
|
Fig. 4 shows the qualitative comparison of the translated infrared image from the visible
image based on the diffusion models. Palette showed outstanding results in object
detection with multiple objects and particular texture retention in infrared images.
Additionally, Palette captures spatial information in pixel space and preserves detailed
texture for high-quality infrared images. In contrast, BBDM generates poor visual
artifacts that result in a failure to detect objects such as trees and cars. This
is because the model focuses only on the semantic information, which is the texture
information lost during diffusion process. These results indicate that BBDM achieves
higher PSNR and SSIM scores; however, it changes the appearance of the object and
eventually degrades the object detection performance. In contrast, Palette translates
the distribution of the translated infrared images to that of the real infrared image
distribution, resulting in promising results in terms of IS and FID scores.
Fig. 4. Qualitative comparison of the translated infrared image.
4.3 Object Detection Performance Based on Mixed Ratio $\mathbf{\lambda}$
Table 5 compares the quantitative object detection performance of the training datasets with
various mixed ratios $\lambda $ of the translated infrared images. We also marked
the relative value increased from the baseline score in parentheses for better comparison.
Furthermore, mixed ratio ${\lambda}$ of 100% refer to the entire translated infrared
images using diffusion models. In mixed ratio ${\lambda}$ of 20%, Palette resulted
62.8% $\mathrm{mAP}_{0.5}$, which relatively improved by 0.3% based on the baseline
$\mathrm{mAP}_{0.5}$. This indicates that the increase of $\mathrm{mAP}_{0.5}$ improves
the object detection performance with superior generalization ability. However, increment
of the mixed ratio $\lambda $ degraded $\mathrm{mAP}_{0.5}$ due to the failure of
full coverage with the real infrared images distribution, e.g., mixed ratio ${\lambda}$
of 100%. The BBDM exhibits tendencies similar to Palette. BBDM with a mixing ratio
${\lambda}$ of 10% achieved 62.9% $\mathrm{mAP}_{0.5},$ demonstrating a relative improvement
of 0.5%. Specifically, the significant decrease in object detection of BBDM to Palette
is the hazy appearance yielded by undesirable discrepancies in the distribution of
real infrared images.
Table 5. The results of object detection according to Mixed Ratio $\mathbf{\lambda}$.
Mixed
Ratio $\mathbf{\lambda}$(%)
|
Baseline
|
Palette
|
BBDM
|
mAP0.5(%)
|
mAP(%)
|
mAP0.5(%)
|
mAP(%)
|
mAP0.5(%)
|
mAP(%)
|
10
|
62.6
|
35.2
|
62.6
|
35.2
|
62.9 (+0.5)
|
35.3 (+0.3)
|
20
|
62.8 (+0.3)
|
35.2
|
62.1
|
35.1
|
30
|
62.4
|
35.3 (+0.3)
|
61.6
|
34.6
|
50
|
62.5
|
35.3 (+0.3)
|
61.4
|
34.3
|
100
|
57.2
|
29.4
|
39.4
|
19.7
|
4.4 Instance-level Analysis
We analyze the effectiveness of instance-level object detection performance by training
datasets with various mixed ratios ${\lambda}$ of the translated infrared images using
Palette and BBDM. Tables 6 and 7 compare the object detection performance of object size- and class-level, respectively.
With the mixed ratio $\lambda $ set to 20%, 30%, and 50%, Palette improves the $\mathrm{mAP}_{s}$,
$\mathrm{mAP}_{m}$, and $\mathrm{mAP}_{l}$ compared to the baseline because it considers
the statistical characteristics of the infrared images. Furthermore, $\mathrm{mAP}_{car}$
consistently increases as the mixed ratio ${\lambda}$ increases in Table 7. This indicates that the structures and texture details of the car are well captured
by Palette due to
the relatively simple shape. In addition, $\mathrm{mAP}_{0.5}$ had no significant
impact, for the large proportion consist of small classes of person and car in the
experimental dataset, as shown in Fig. 5. In contrast, the object detection performance of BBDM degraded as the mixed ratio
$\lambda $ increased. BBDM failed to preserve representation ability during the diffusion
process of translating infrared images, thereby demoting the infrared image quality
and object detection overall performances.
Fig. 5. Ratio by object size and class.
Table 6. The results of object detection performance based on the object size.
Mixed
Ratio $\mathbf{\lambda}$(%)
|
Baseline
|
Palette
|
BBDM
|
mAPs(%)
|
mAPm(%)
|
mAPl(%)
|
mAPs(%)
|
mAPm(%)
|
mAPl(%)
|
mAPs(%)
|
mAPm(%)
|
mAPl(%)
|
10
|
21.6
|
59.9
|
72.9
|
21.7 (+0.5)
|
59.7
|
72.4
|
21.8 (+0.9)
|
59.9
|
72.7
|
20
|
21.6
|
60.1 (+0.3)
|
73.2 (+0.4)
|
21.5
|
59.9
|
73.4 (+0.7)
|
30
|
21.7 (+0.5)
|
60.0 (+0.2)
|
74.2 (+1.8)
|
21.1
|
59.3
|
72.2
|
50
|
21.7 (+0.5)
|
60.3 (+0.7)
|
73.8 (+1.2)
|
20.8
|
59.6
|
71.6
|
100
|
17.3
|
52.1
|
65.7
|
7.7
|
40.0
|
60.2
|
Table 7. The results of object detection performance based on the object class
Mixed
Ratio $\mathbf{\lambda}$(%)
|
Baseline
|
Palette
|
BBDM
|
mAPperson(%)
|
mAPcsr(%)
|
mAPperson(%)
|
mAPcsr(%)
|
mAPperson(%)
|
mAPcsr(%)
|
10
|
55.6
|
69.6
|
55.7 (+0.3)
|
69.6
|
55.4
|
70.3 (+1.0)
|
20
|
55.9 (+0.5)
|
69.7 (+0.3)
|
54.6
|
69.7 (+0.3)
|
30
|
54.9
|
69.9 (+0.5)
|
53.8
|
69.3
|
50
|
54.6
|
70.5 (+1.3)
|
53.5
|
69.3
|
100
|
42.0
|
72.4 (+4.0)
|
22.9
|
55.9
|
4.5 Distribution Analysis
We verify the effectiveness of the relationship between the distributions of the real
infrared images and translated infrared images from visible images. In particular,
we utilize the Uniform Manifold Approximation and Projection (UMAP) [19] dimension reduction technique for the visualization of high-dimensional feature of
image in low dimensional space. Fig. 6 illustrates the results of the real and translated infrared images distribution.
Palette overlaps the distribution of real infrared images than BBDM. This overlap
result shows the distribution of the translated images is similar to the real infrared
images, which indicated that Palette results have better quantitative and qualitative
performances.
Fig. 6. The results of translated infrared images and Ground truth distribution.
5. Conclusion
In this study, we analyzed object detection performance by infrared data augmentation
based on diffusion models according to the number of images, classes, and object size.
We first used the pixel space-based method Palette and the latent space-based method
BBDM to translate infrared images from visible images. Palette with the mixed ratio
of 20% and BBDM with the mixed ratio of 10%, respectively, improved by 0.3% and 0.5%
compared to the baseline. In particular, we demonstrated that infrared data augmentation
based on diffusion models can improve object detection performance, overcoming the
lack of infrared image datasets. Finally, experimental results confirmed that the
more similar the real and translated infrared image distributions, the better the
qualitative and quantitative performance. Moreover, we show that that the diffusion
model can be used to improve object detection performance. Some important directions
for future work are to study the impact on object detection performance for datasets
larger than 3,000 and developing a more effective diffusion model to translate infrared
images from visible images.
REFERENCES
S. Park, A. G. Vien and C. Lee, "Cross-Modal Transformers for Infrared and Visible
Image Fusion," IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 2, pp. 770-785,
Feb. 2024.
https://paperswithcode.com/datasets?task=object-detection
P. Kaur, B. S. Khehra and B. S. Mavi, "Data Augmentation for Object Detection: A Review,"
IEEE Int. Midwest Symp. Circuits Syst., pp. 537-543, Aug. 2021.
G. Mariani, F. Scheidegger, R. Istrate, C. Bekas and C. Malossi, "BAGAN: data augmentation
with balancing GAN," arXiv:1803.09655, 2018.
P. Dhariwal and A. Nichol, "Diffusion models beat GANs on image synthesis," in Proc.
Adv. Neural Inf. Process. Syst., pp. 8780-8794, 2021.
C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning,"
J. Big Data, vol. 6, no. 1, pp. 1-48, 2019.
I. Golan and R. El-Yaniv, "Deep anomaly detection using geometric transformations,"
in Proc. Adv. Neural Inf. Process. Syst., pp. 9758-9769, 2018.
K. He, J. Sun and X. Tang, "Guided image filtering," IEEE Trans. Pattern Anal. Mach.
Intell., vol. 35, no. 6, pp. 1397-1409, Jun. 2013.
R. Lopez et al., ``Information constraints on auto-encoding variational bayes,'' in
Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 6114-6125.
P. Isola, J.-Y. Zhu, T. Zhou and A. A. Efros, "Image-to-image translation with conditional
adversarial networks," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1125-1134,
2017.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, ``Unpaired image-to-image translation
using cycle-consistent adversarial networks,'' in Proc. IEEE Int. Conf. Comput. Vis.,
Oct. 2017, pp. 81-88.
J. Ho, A. Jain, and P. Abbeel, ``Denoising diffusion probabilistic models,'' in Proc.
Adv. Neural Inf. Process. Syst., 2020, pp. 6840-6851.
C. Saharia et al., "Palette: image-to-image diffusion models," in Proc. ACM SIGGRAPH
Conf., pp. 1-10, 2022.
Li, Bo, et al. "BBDM: image-to-image translation with brownian bridge diffusion models,"
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1952-1961, Vancouver, Canada,
2023.
https://www.flir.ca/oem/adas/adas-dataset-form/
H. Kim, J. Ahn, T. Lee, and B. Choi, "The object detector for aerial image using high
resolution feature extractor and attention module," J. Korean Inst. Inf. ELectr. Commun.
Technol., vol. 48, no. 1, pp. 1-11, Jan, 2023.
T. Salimans et al., ``Improved techniques for training GANs,'' in Proc. Adv. Neural
Inf. Process. Syst., 2016, pp. 2234-2242.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler and S. Hochreiter, "GANs trained
by a two time-scale update rule converge to a local nash equilibrium," in Proc. 31st
Int. Conf. Neural Inf. Process. Syst., pp. 6629-6640, 2017.
L. McInnes, J. Healy, and J. Melville, ``UMAP: Uniform manifold approximation and
projection for dimension reduction,'' arXiv:1802.03426, 2020.
Seonghyun Park. received the B.S. degree in electrical, electronic, and control
engineering from Hankyong National University, Anseong, South Korea, in 2020, and
the M.S. degree in multimedia engineering from Dongguk University, Seoul, South Korea,
in 2023. He is currently a Junior Researcher with the Intelligence S/W Team, Hanwha
Systems Co., Ltd., Seongnam, South Korea. His current research interests include image
processing and computational imaging.
Taeyoung Lee. received the B.S. degree in information and control engineering from
robotics school, Kwangwoon University, Seoul, South Korea, in 2009, and the M.S. degree
in control and instrumentation engi-neering from robotics school in Kwangwoon University,
Seoul, South Korea, in 2011. He is currently a Senior Researcher with the Intelligent
S/W Team, Hanwha Systems Co., Ltd., Seongnam, South Korea. His current research interests
include object detection, object tracking and segmentation with deep Learning also
generative AI.
Jongsik Ahn. received the B.S. degree in mechanical engineering from Kyunghee University,
Suwon, South Korea, in 2017, and the M.S. degree from the school of electronic and
electrical engineering, Kyungpook National University, Daegu, South Korea, in 2022.
He is currently a Researcher with the Intelligent S/W Team, Hanwha Systems Co., Ltd.,
Seongnam, South Korea. His current research interests include infrared image object
detection, segmentation, and object tracking.
Haemoon Kim. received the B.S. degree in electrical, electronic, and control engineering
from Hankyong National University, Anseong, South Korea, in 2020, and the M.S. degree
from Computer Science and Engi-neering, Hanyang University, Ansan, South Korea in
2022. He is currently a Junior Researcher with the Intelligence S/W Team, Hanwha Systems
Co., Ltd., Seongnam, South Korea. His current research interests include object detection,
instance segmentation, and aerial image processing.
Hyunhak Kim. received the B.S. and M.S. degrees in biomechanical engi-neering from
Sungkyunkwan University (SKKU), South Korea, in 2020 and 2022, respectively. He is
currently a Junior Researcher with the Intelligence S/W Team, Hanwha Systems Co.,
Ltd., Seongnam, South Korea. His current research interests include reinforcement
learnings, object detections, image processing, and language model.
Seoyoung Kim. received the B.S. degree in chemical and biological engineering from
Korea University, Seoul, South Korea, in 2020. She is currently a Junior Researcher
with the Intelligence S/W Team, Hanwha Systems Co., Ltd., Seongnam, South Korea. Her
current research interests include image processing and 3D vision.
Byungin Choi. received the B.S., M.S., and Ph.D. degrees in electronic engineering
from Hanyang University in Seoul, South Korea, in 2001, 2003, and 2008, respectively.
He is currently a Leader of the Intelligent S/W team Hanwha Systems Co., Ltd., Seongnam,
South Korea. His current research interests include object detection, object tracking,
and super resolution.