KimYejin1
YimChanghoon2
-
(Intelligent Image Processing Laboratory, Konkuk University, Seoul, Korea jinye96@konkuk.ac.kr)
-
(Intelligent Image Processing Laboratory, Konkuk University, Seoul, Korea cyim@konkuk.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
1. Introduction
Haze is a phenomenon in which particles in the air scatter light and obscure an
image [1]. As a result, outdoor images may have visual limitations. Haze-induced issues can
be fatal, especially in situations where traffic conditions require high visibility
in real time [2], such as car accidents, suspension of aircraft operations, and docking of ships.
Various studies have been conducted to remove the haze effects from images to ensure
consistent visibility regardless of weather conditions. In addition, during disasters
such as fires, it is essential to secure visibility in the event of high smoke levels
in the air. Removing the haze effects in images is essential to observe situations
through video equipment such as CCTV to cope with disaster situations.
Consistent visible images have many advantages. Enhanced images after haze removal
can be used as data for various deep learning applications, such as object recognition
and tracking. In deep learning, the learning effect of recognizing objects in images
may vary even when the illumination varies at weak points [3]. Hence, dehazed images are highly desirable in various fields.
In robot vision, cameras are critical because they are responsible for the visual
capabilities. Obtaining clear images from the camera regardless of the weather conditions
can determine the performance of robot vision and mobility [4]. Dehazing can be used to obtain haze-free real-time images in driving environments
and can provide improved visibility during driving and parking [5].
Consistently available high visibility can also reduce the probability of accidents
in autonomous driving. Additionally, image dehazing can be applied to crime prevention
by assisting in identifying the face of a perpetrator in a hazy image. Hence, image
dehazing is important in the field of image processing and image enhancement. Because
image dehazing is an ill-posed problem, it is necessary to test various approaches
based on an atmospheric scattering model to solve it.
Many studies on image dehazing have been performed using convolutional neural
networks (CNNs) [6-8]. The DehazeNet method [6] was used to estimate a transmission map directly from a hazy image. The AOD-Net method
[7] obtains a dehazed image from a hazy image using CNNs. Recently, various studies have
been conducted using methods to estimate feature maps in hazy images to produce haze-free
images [8].
One method [9] uses CNNs to estimate the depth, and another method [10] applies transfer learning using a simple encoder-decoder network structure with skip
connections. In this paper, we propose a way to estimate the transmission map indirectly
from depth estimation using CNNs to generate dehazed images based on an atmospheric
scattering model.
2. Related Works
In the past, haze removal was performed using image processing, in which basic
contrast enhancement could be used. However, most haze removal methods are based on
hypotheses or empirical evidence. For example, the dark channel prior (DCP) method
[12] is based on the hypothesis that there is a low value in at least one of three RGB
channels in color images. Several methods using CNNs that can remove haze within an
image were developed with advancements in deep learning techniques. This led to the
development of the DehazeNet method [6], which is based on an atmospheric scattering model and estimates the transmission
map from a hazy image through a CNN. An illustration of the atmospheric scattering
model is shown in Fig. 1.
The atmospheric scattering model [11] can be represented as:
From Eq. (1), the clean (haze-free) image $J\left(x\right)$ can be expressed as:
In Eq. (1), $I\left(x\right)~ $is the hazy image, $J\left(x\right)$ is the clean image, $t\left(x\right)$is
the transmission map, and $\alpha $ is the global atmospheric light. The transmission
map $t\left(x\right)$ [1,6] can be expressed as:
In Eq. (3), $\beta $ is the scattering coefficient, and $~ d\left(x\right)$ is the depth (distance).
Eq. (3) shows that depth information affects the transmission values $t\left(x\right)$.
Typical image dehazing methods estimate the transmission map. In some previous
works, the depth information was used for a similar problem of fog removal. Fog effects
have been removed using depth estimation, which is based on the assumption that the
difference between brightness and saturation becomes larger as the depth becomes larger
[19]. Depth values have been estimated from the degree of blur for single-image fog removal
[20]. Unlike previous studies, we propose the application of depth information with deep
learning for the estimation of transmission maps. In the proposed method, the depth
is estimated using deep learning methods, which give more accurate depth values.
DehazeNet [6] is a typical haze removal method that uses CNNs to estimate the transmission map
and obtain a dehazed image from it based on an atmospheric scattering model. It requires
guided image filtering as a post-processing procedure to refine the transmission map.
An advantage of this method is that CNNs are used for deep learning to solve the image
dehazing problem.
Unlike the DehazeNet [6] method, the AOD-Net [7] method combines a transmission map and global atmospheric light parameters into a
single parameter function $K\left(x\right)$, which can be learned through deep learning
networks. An advantage of AOD-Net is that it is an end-to-end deep learning network.
The densely connected pyramid dehazing network (DCPDN) [13] is a GAN-based method that can produce an image similar to a dehazed image. In this
method, separate networks are used for learning the transmission map and global atmospheric
light so that it can generate a haze-free image through a joint discriminator. A recent
method called FFA-Net [8] learns channel attention and pixel attention maps on a block-by-block basis. It removes
the haze by concatenating a hazy image and learns the feature maps as residual networks.
Deep learning networks have been used for depth estimation as well as haze removal.
The Monodepth method [14] learns from stereo images and can predict depth information from a single image,
whereas the Densedepth method [10] can estimate depth information using transfer learning. In the Monodepth method [14], KITTI data [15] are used as stereo images to estimate the disparity map derived from the left image,
which is consistent with the right image. The right image requires the disparity map
estimated on the left image to calculate the error. As this process repeats, it is
possible to create an image in the opposite direction, which allows the creation of
stereo images and depth information.
Monodepth2 [9] is a follow-up to the Monodepth method [14]. It uses the characteristics of the KITTI dataset [15], which was constructed using consecutive images captured by a moving car. In Monodepth2,
the results can be corrected through reprojection. This method leads to fewer errors
because of the creation of stereo images. The Densedepth method [10] uses layers consisting of an encoder–decoder structure, which are interconnected
using skip connections. In this method, KITTI data [15] and NYU Depth V2 data [16] can be used to estimate depth information for both indoor and outdoor images.
Table 1. Parameters of depth estimation networks.
Parameter
|
Monodepth2
|
Densedepth
|
Training
dataset
|
KITTI dataset
|
NYU2 depth dataset
KITTI dataset
|
Batch size
|
12
|
4
|
Epoch
|
20
|
20
|
Learning rate
|
0.0001
|
0.0001
|
Min depth
|
0.1
|
10
|
Max depth
|
100.0
|
1000
|
Optimizer
|
Adam
|
Adam
|
Fig. 1. Atmospheric scattering model.
Fig. 2. Sequence diagram of the proposed method.
Fig. 3. Network structure of the Monodepth2 method for depth estimation.
3. The Proposed Method
The proposed method generates a transmission map indirectly from a depth map,
which can be generated by using previous depth estimation networks based on deep learning.
Then, we obtain a dehazed image using a transmission map based on an atmospheric scattering
model. Fig. 2 shows a sequence diagram of the proposed method.
As shown in Fig. 2, the depth map is estimated before the estimation of the transmission map. For the
depth map, a training process is performed using depth estimation networks based on
deep learning. After the training process is complete, a depth estimation model is
obtained. Once the depth estimation model is obtained, image dehazing can be performed
on a hazy image. For a hazy input image, we estimate the depth map using the depth
estimation model. The transmission map is estimated from the depth map using the relationship
described in Eq. (3). Finally, we obtain the dehazed image using the atmospheric scattering model described
in Eq. (2).
Two methods were tested for the depth estimation model. The first method is Monodepth2
[9], which allows the correction of loss values in learning by applying additional information
to the network for the depth estimation. The second method is Densedepth [10], which uses transfer learning with both indoor and outdoor image data.
The network structure of Monodepth2 method is shown in Fig. 3. The depth network is based on the U-Net structure, which enables the prediction
of the overall depth information. The pose network assists in predicting the depth
information from the movement of objects that are in the front and rear image frames.
Using the information in the pose networks, the networks adjust the parameters to
generate a depth map. For this method, training is conducted using the KITTI datasets
[15], which include mono images, stereo images, and mono and stereo images.
Fig. 4 presents the detailed network structure of the Densedepth method [10]. This network was originally applied for image classification, and the encoder–decoder
structure method was used to estimate the depth. The training of this method uses
the KITTI data and NYU2 depth data. The NYU2 depth data are indoor data, and the KITTI
data are outdoor data.
4. Experimental Results
Table 1 presents the parameters used for training Monodepth2 and Densedepth in the experiments.
4.1 Results of Depth Estimation Networks
Fig. 5 shows the experimental results using Monodepth2. Fig. 5(a) shows a test image of the Berkeley dataset [17] as the input. Figs. 5(b)-(d) show the resulting depth maps using mono images as the training data. Figs. 5(b) and (c) show the results of training with image sizes of 640 ${\times}$ 192 and 1024 ${\times}$
320, respectively.
Fig. 5(d) shows the result of training with an image size of 640 ${\times}$ 192, as shown in
Fig. 5(b) without applying the pose network. Figs. 5(e)-(g) are the resulting depth map images using stereo images as the training data.
Figs. 5(e) and (f) show the results of training with image sizes of 640 ${\times}$ 192 and 1024 ${\times}$
320, respectively. Fig. 5(g) shows the resulting depth map without applying the pose network with a size of 640
${\times}$ 192, which is same as the size in Fig. 5(e).
Figs. 5(h)-(j) show the resulting depth maps using both mono and stereo images as the training data.
Figs. 5(h) and (i) show the results of training with sizes of 640 ${\times}$ 192 and 1024 ${\times}$
320, respectively. Fig. 5(j) show the result of training without applying the pose network with a size of 640${\times}$192,
which is the same as the size in Fig. 5(h). There is a tendency for small objects to be perceived at farther distances and for
large objects to be perceived at nearer distances. If the pose network is not applied,
the overall depth outlines are blurred, and the depth estimation values become less
accurate.
Fig. 6 shows the experimental results obtained using Densedepth. The encoder part of Densedepth
network was set as DenseNet-169 [21] for the experiments. We compared the results of depth estimation for indoor and outdoor
images using the NYU2 depth dataset and KITTI dataset.
Fig. 6(a) shows the indoor image data [16] used as the input for Figs. 6(b) and (c). Fig. 6(d) shows the outdoor image data [19] used as the input for Figs. 6(e) and (f). Figs. 6(b) and (e) show the depth map images obtained by training using the NYU2 depth dataset. Figs. 6(c) and (f) show the depth map images obtained by training using the KITTI dataset.
The NYU2 depth dataset provides indoor image data, and the KITTI dataset provides
outdoor image data, so the resulting depth maps are different. With the NYU2 depth
dataset, the results preserve more edges of objects. The results obtained with the
NYU2 depth dataset show more detailed depth results than those obtained with the KITTI
dataset for indoor images. For the outdoor images, the results obtained with the NYU2
depth dataset cannot predict the overall depth map, while the results with the KITTI
dataset can predict the depth map more evenly.
Fig. 4. Detailed network structure of the Densedepth method (a) Encoder network, (b) Decoder network (AV: average pooling, CC: concatenate, CV: convolution, DB: dense block, GAP: global average pooling, MP: max pooling, SM: softmax, US: up-sampling).
Fig. 5. Results of depth estimation by training using the various configurations of data (a) Input image, (b) Result with mono image (640${\times}$192), (c) Result with mono image (1024 ${\times}$ 320), (d) Result with mono image (640 ${\times}$ 192) without the pose network, (e) Result with stereo images (640 ${\times}$ 192), (f) Result with stereo images (1024 ${\times}$ 320), (g) Result with stereo images (640 ${\times}$ 192) without the pose network, (h) Result with mono and stereo images (640 ${\times}$ 192), (i) Result with mono and stereo images (1024 ${\times}$ 320), (j) Result with mono and stereo images (640 ${\times}$ 192) without the pose network.
Fig. 6. Results of depth estimation by training using the various configurations of the dataset (a) Indoor input image, (b) Result depth map by [10] with NYU2 depth training data, (c) Result depth map by [10] with KITTI training data, (d) Outdoor input image, (e) Result depth map by [10] with NYU2 depth training data, (f) Result depth map by [10] with KITTI training data.
4.2 Transmission Map and Dehazed Image Obtained using the Proposed Method
For depth estimation, we used previously described depth estimation networks
[9,10]. Both networks were implemented using TensorFlow codes. For the test, unannotated
real-world hazy images were used [18]. Haze removal experiments were carried out by converting the depth map into the transmission
map using the relationship described in Eq. (3). Figs. 7 and 8 show the process of image dehazing using the proposed method.
Figs. 7(a) and 8(a) show the input hazy images [18]. Figs. 7(b) and 8(b) show the depth maps from the input images using the depth estimation model with the
depth estimation network. Figs. 7(c) and 8(c) show the visualized transmission maps from the depth map. Figs. 7(d) and 8(d) show the dehazed images after the haze removal is carried out from the transmission
map using the atmospheric scattering model described in Eq. (2). In these results, the depth value becomes lower for nearby objects, and the transmission
value become higher. In addition, objects at farther distances changed more.
Fig. 7. Process of image dehazing using the proposed method (a) Input image, (b) Depth information, (c) Transmission map (visualized), (d) Dehazed image obtained using the proposed method.
Fig. 8. Process of image dehazing using the proposed method (a) Input image, (b) Depth map, (c) Transmission map (visualized), (d) Dehazed image obtained using the proposed method.
4.3 Result Comparison of Proposed Method and DehazeNet
DehazeNet [6] directly generates a transmission map from a hazy image using the CNNs. Comparisons
of the results for hazy natural outdoor images for the proposed method and the DehazeNet
method are shown in Figs. 9-11. Figs. 9(a)-(c) show the hazy images obtained using Bdd100k [17]. Figs. 10(a)-(c) show the dehazed images obtained using the DehazeNet method. Figs. 11(a)-(c) show the dehazed images obtained using the proposed method.
In the DehazeNet method, haze effects are sufficiently removed, changes in the
intensity levels are high, and the roads are excessively darkened from recognizing
the bright parts of the road as haze. In the results obtained using the proposed method,
the haze is removed more evenly, and the road parts are dehazed correctly while preserving
the intensity levels without any darkening effects.
We also performed experiments using the synthetic objective testing set (SOTS)
[18] for the comparison of image dehazing results by DehazeNet and the proposed method
with Monodepth2 and Densedepth. Figs. 12 and 13 show the results with the PSNR values,
which were calculated using the groundtruth images of SOTS. The figures show that
the proposed method gives better PSNR results for image dehazing than the DehazeNet
method. The dehazed images from DehazeNet are darker than those from the proposed
method.
Fig. 9. Hazy Images obtained using the Bdd100k (a) Hazy image of a street, (b) Hazy image of a road with heavy traffic, (c) Hazy image of roadside trees.
Fig. 10. Dehazed images obtained using the DehazeNet method (a) Dehazed image of the street, (b) Dehazed image of a road with heavy traffic, (c) Dehazed image of roadside trees.
Fig. 11. Dehazed Images obtained using the proposed method (a) Dehazed image of the street, (b) Dehazed image of a road with heavy traffic, (c) Dehazed image of roadside trees.
Fig. 12. Image dehazing results with PSNR values obtained by the proposed method and DehazeNet (a) Hazy image of cityscapes, (b) Dehazed image using the proposed method with Monodepth2 (PSNR: 28.30), (c) Dehazed image using the proposed method with Densedepth (PSNR: 29.35), (d) Dehazed image using DehazeNet (PSNR: 27.60), (e) Groundtruth image.
Fig. 13. Image dehazing results with PSNR values obtained by the proposed method and DehazeNet (a) Hazy image of roadways, (b) Dehazed image using the proposed method with Monodepth2 (PSNR: 29.15), (c) Dehazed image using the proposed method with Densedepth (PSNR: 28.50), (d) Dehazed image using DehazeNet (PSNR: 27.99), (e) Groundtruth image.
Fig. 14. Results of a highly-hazed road image with various scattering coefficient values (a) Input image, (b) β = 0.2, (c) β = 0.4, (d) β = 0.6, (e) β = 0.8, (f) β = 1.0, (g) β = 1.2.
Fig. 15. Results of an airport image with various scattering coefficient values (a) Input image, (b) β = 0.2, (c) β = 0.4, (d) β = 0.6, (e) β = 0.8, (f) β = 1.0, (g) β = 1.2.
Fig. 16. Results of a parking lot image with various scattering coefficient values (a) Input image, (b) β = 0.2, (c) β = 0.4, (d) β = 0.6, (e) β = 0.8, (f) β = 1.0, (g) β = 1.2.
Fig. 17. Results of creek between buildings with various scattering coefficient values (a) Input image, (b) β = 0.2, (c) β = 0.4, (d) β = 0.6, (e) β = 0.8, (f) β = 1.0, (g) β = 1.2.
Fig. 18. Results of park image with various scattering coefficient values (a) Input image, (b) β = 0.2, (c) β = 0.4, (d) β = 0.6, (e) β = 0.8, (f) β = 1.0, (g) β = 1.2.
4.4 Comparison of the Results with Respect to β
The degree of change in the transmission map by depth value can be adjusted using
the scattering coefficient β in Eq. (3). Figs. 14-18 show the results obtained with various β values (0.2, 0.4, 0.6, 0.8,
1.0, and 1.2, respectively). As β increases, the degree of change in the transmission
map by depth value becomes higher. In this case, there is more difference in these
values for the dehazed image compared to the input hazy image. Conversely, as β decreases,
the degree of change in the transmission map by depth value becomes lower.
In these results, it is observed that the β value of 1 results in more appropriate
dehazed images. Depending on the characteristics of the original hazy image, a disadvantage
of creating an excessively dark area appears when β becomes high in high-contrast
areas such as shadows. If the estimated depth information does not match well with
the original image, a high β value may result in artifacts due to the errors in depth
information. If the depth information is somewhat similar to the original image, a
higher β value provides better dehazing effects. When applying the depth estimation
network trained with the KITTI dataset, the dehazing process was performed relatively
well for the road environment. The dehazing effects were relatively low when there
were buildings on both sides without any vehicles, as shown in Figs. 15 and 16.
5. Conclusion
In this paper, we proposed a novel technique for image dehazing by indirectly
creating a transmission map through the estimation of the depth map as opposed to
direct estimation of the transmission map in previous image dehazing methods. The
dehazing results using the proposed method were superior to those of previous methods
that generate the transmission map directly with post-processing from the input image.
However, the proposed method has limitations. First, the dataset covering the road
environment in daylight could provide incorrect results for depth map estimation.
Second, it is necessary to set the value of atmospheric light adaptively as each test
set needs the appropriate atmospheric light value to be estimated for image dehazing.
Future research should be directed to resolve these issues.
ACKNOWLEDGMENTS
This research was supported by the MSIT (Ministry of Science, ICT), Korea, under
the ITRC (Information Technology Research Center) support program (IITP-2020-2016-0-00465)
and supervised by the IITP (Institute for Information & communications Technology
Planning & Evaluation). This work was also supported by the National Foundation of
Korea (NRF) grant, which is funded by the Korean government (MIST) (NRF-2019R1H1A2079873).
REFERENCES
Narasimhan S. G., Nayar S. K., Jul. 2002, Vision and the atmosphere, Int. J. Comput.
Vision, Vol. 48, No. 3, pp. 233-254
Jingkun Z., Sep. 2015, Analysis of causes and hazards of China’s frequent hazy weather,
The Open Cybernetics & Systemics Journal, Vol. 9, pp. 1311-1314
Yan Z., Zhang H., Wang B., Paris S., Yu. Y., 2016, Automatic photo adjustment using
deep learning, ACM Trans. Graphics, Vol. 35, No. 2, pp. 11
Cowan C. K., Kovesi P. D., May. 1988, Automatic sensor placement from vision task
requirements, IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 10, No. 3, pp.
407-416
Lee S., Maik V., Jang J., Shin J., Paik J., May. 2005, Noise-adaptive spatio-temporal
filter for real-time noise removal in low light level images, IEEE Trans. Consumer
Electronics, Vol. 51, No. 2, pp. 648-653
Cai B., Xu X., Jia K., Qing C., Tao D., Jan. 2016, DehazeNet: an end-to-end system
for single image haze removal, IEEE Trans. Image Processing, Vol. 25, No. 11, pp.
5187-5198
Li B., Peng X., Wang Z., Xu J-Z., Feng D., 2017, AOD-Net: all-in-one dehazing Network,
IEEE Int. Conf. Computer Vision, pp. 4770-4778
Xu Qin , Zhilin Wang , Yuanchao Bai , Xiaodong Xie , Huizhu Jia , 2019, FFA-Net: Feature
fusion attention network for single image dehazing, arXiv preprint arXiv:1911.07559
Godard C., Aodha O. M., Brostow. G. J., 2019, Digging into self-supervised monocular
depth estimation, IEEE Int. Conf. Computer Vision
Alhashim I., Wonka. P., 2018, High quality monocular depth estimation via transfer
learning, arXiv e-prints, abs/1812.11941
McCartney E. J., 1976, Optics of the atmosphere: Scattering by molecules and particles,
New York, NY, USA: Wiley
He K., Sun J., Tang X., Dec. 2011, Single image haze removal using dark channel prior,
IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 33, No. 12, pp. 2341-2353
Zhang H., Patel V. M., 2018, Densely connected pyramid dehazing network, IEEE Int.
Conf. Computer Vision Pattern Recognition, pp. 3194-3203
Godard C., Aodha O. M., Brostow G. J., 2017, Unsupervised monocular depth estimation
with left-right consistency, IEEE Int. Conf. Computer Vision Pattern Recognition,
pp. 270-279
Geiger A., Lenz P., Stiller C., Urtasun R., Sep. 2013, Vision meets robotics: The
KITTI dataset, International Journal of Robotics Research, Vol. 32
Silberman N., Hoiem D., Kohli P., Fergus R., 2012, Indoor segmentation and support
inference from rgbd images, European Conf. Computer Vision
Yu F., Chen H., Wang X., Xian W., Chen Y., Liu F., Madhavan V., Darrell T., 2020,
Bdd100k: a diverse driving dataset for heterogeneous multitask learning, IEEE Conf.
Computer Vision Pattern Recognition
Li B., Ren W., D.Fu , Tao D., Feng D., Zeng W., Wang. Z., Aug. 2019, Benchmarking
single image dehazing and beyond, IEEE Transactions on Image Processing, Vol. 28,
No. 1, pp. 492-505
Pal D., Arora A., 2018, Removal of fog effect from highly foggy images using depth
estimation and fuzzy contrast enhancement method, International Conference on Computing
Communication and Automation, pp. 1-6
Jiwani M. A., Dandare S. N., Jun 2013, Single image fog removal using depth estimation
based on blur estimation, International Journal of Scientific and Research Publications,
Vol. 3, No. 6, pp. 1-6
Huang G., Liu Z., Maaten L., Weinberger K. Q., 2017, Densely connected convolutional
networks, IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269
Author
Yejin Kim received a BSc in Software from Konkuk University, Korea, in 2018. Currently,
she is a graduate student at the Department of Software at Konkuk University and a
researcher in the Intelligent Image Processing Laboratory. Her research interests
include image dehazing via deep learning.
Changhoon Yim received a BSc from the Department of Control and Instrumentation
Engineering, Seoul National University, Korea, in 1986, an MSc in Electrical and Electronics
Engineering from the Korea Advanced Institute of Science and Technology in 1988, and
a PhD in Electrical and Computer Engineering from the University of Texas at Austin
in 1996. He worked as a research engineer at the Korean Broadcasting System from 1988
to 1991. From 1996 to 1999, he was a member of the technical staff in the HDTV and
Multimedia Division, Sarnoff Corporation, New Jersey, USA. From 1999 to 2000, he worked
at Bell Labs, Lucent Technologies, New Jersey, USA. From 2000 to 2002, he was a Software
Engineer at KLA-Tencor Corporation, California, USA. From 2002 to 2003, he was a Principal
Engineer at Samsung Electronics, Suwon, Korea. Since 2003, he has been a faculty member
and is currently a professor in the Department of Computer Science and Engineering,
Konkuk University, Seoul, Korea. His research interests include digital image processing,
video processing, multimedia communication, and deep learning.