Mobile QR Code

1. (Intelligent Image Processing Laboratory, Konkuk University, Seoul, Korea jinye96@konkuk.ac.kr)
2. (Intelligent Image Processing Laboratory, Konkuk University, Seoul, Korea cyim@konkuk.ac.kr )

## 1. Introduction

Haze is a phenomenon in which particles in the air scatter light and obscure an image [1]. As a result, outdoor images may have visual limitations. Haze-induced issues can be fatal, especially in situations where traffic conditions require high visibility in real time [2], such as car accidents, suspension of aircraft operations, and docking of ships. Various studies have been conducted to remove the haze effects from images to ensure consistent visibility regardless of weather conditions. In addition, during disasters such as fires, it is essential to secure visibility in the event of high smoke levels in the air. Removing the haze effects in images is essential to observe situations through video equipment such as CCTV to cope with disaster situations.

Consistent visible images have many advantages. Enhanced images after haze removal can be used as data for various deep learning applications, such as object recognition and tracking. In deep learning, the learning effect of recognizing objects in images may vary even when the illumination varies at weak points [3]. Hence, dehazed images are highly desirable in various fields.

In robot vision, cameras are critical because they are responsible for the visual capabilities. Obtaining clear images from the camera regardless of the weather conditions can determine the performance of robot vision and mobility [4]. Dehazing can be used to obtain haze-free real-time images in driving environments and can provide improved visibility during driving and parking [5].

Consistently available high visibility can also reduce the probability of accidents in autonomous driving. Additionally, image dehazing can be applied to crime prevention by assisting in identifying the face of a perpetrator in a hazy image. Hence, image dehazing is important in the field of image processing and image enhancement. Because image dehazing is an ill-posed problem, it is necessary to test various approaches based on an atmospheric scattering model to solve it.

Many studies on image dehazing have been performed using convolutional neural networks (CNNs) [6-8]. The DehazeNet method [6] was used to estimate a transmission map directly from a hazy image. The AOD-Net method [7] obtains a dehazed image from a hazy image using CNNs. Recently, various studies have been conducted using methods to estimate feature maps in hazy images to produce haze-free images [8].

One method [9] uses CNNs to estimate the depth, and another method [10] applies transfer learning using a simple encoder-decoder network structure with skip connections. In this paper, we propose a way to estimate the transmission map indirectly from depth estimation using CNNs to generate dehazed images based on an atmospheric scattering model.

## 2. Related Works

In the past, haze removal was performed using image processing, in which basic contrast enhancement could be used. However, most haze removal methods are based on hypotheses or empirical evidence. For example, the dark channel prior (DCP) method [12] is based on the hypothesis that there is a low value in at least one of three RGB channels in color images. Several methods using CNNs that can remove haze within an image were developed with advancements in deep learning techniques. This led to the development of the DehazeNet method [6], which is based on an atmospheric scattering model and estimates the transmission map from a hazy image through a CNN. An illustration of the atmospheric scattering model is shown in Fig. 1.

The atmospheric scattering model [11] can be represented as:

##### (1)
$I~ \left(x\right)=~ J~ \left(x\right)t\left(x\right)+~ \alpha ~ \left(1~ -~ t\left(x\right)\right).$

From Eq. (1), the clean (haze-free) image $J\left(x\right)$ can be expressed as:

##### (2)
$J\left(x\right)=~ \frac{1}{t\left(x\right)}I\left(x\right)-\frac{1}{t\left(x\right)}\alpha +\alpha$ .

In Eq. (1), $I\left(x\right)~$is the hazy image, $J\left(x\right)$ is the clean image, $t\left(x\right)$is the transmission map, and $\alpha$ is the global atmospheric light. The transmission map $t\left(x\right)$ [1,6] can be expressed as:

##### (3)
$t\left(x\right)=~ e^{-\beta d\left(x\right)}~ .$

In Eq. (3), $\beta$ is the scattering coefficient, and $~ d\left(x\right)$ is the depth (distance). Eq. (3) shows that depth information affects the transmission values $t\left(x\right)$.

Typical image dehazing methods estimate the transmission map. In some previous works, the depth information was used for a similar problem of fog removal. Fog effects have been removed using depth estimation, which is based on the assumption that the difference between brightness and saturation becomes larger as the depth becomes larger [19]. Depth values have been estimated from the degree of blur for single-image fog removal [20]. Unlike previous studies, we propose the application of depth information with deep learning for the estimation of transmission maps. In the proposed method, the depth is estimated using deep learning methods, which give more accurate depth values.

DehazeNet [6] is a typical haze removal method that uses CNNs to estimate the transmission map and obtain a dehazed image from it based on an atmospheric scattering model. It requires guided image filtering as a post-processing procedure to refine the transmission map. An advantage of this method is that CNNs are used for deep learning to solve the image dehazing problem.

Unlike the DehazeNet [6] method, the AOD-Net [7] method combines a transmission map and global atmospheric light parameters into a single parameter function $K\left(x\right)$, which can be learned through deep learning networks. An advantage of AOD-Net is that it is an end-to-end deep learning network. The densely connected pyramid dehazing network (DCPDN) [13] is a GAN-based method that can produce an image similar to a dehazed image. In this method, separate networks are used for learning the transmission map and global atmospheric light so that it can generate a haze-free image through a joint discriminator. A recent method called FFA-Net [8] learns channel attention and pixel attention maps on a block-by-block basis. It removes the haze by concatenating a hazy image and learns the feature maps as residual networks.

Deep learning networks have been used for depth estimation as well as haze removal. The Monodepth method [14] learns from stereo images and can predict depth information from a single image, whereas the Densedepth method [10] can estimate depth information using transfer learning. In the Monodepth method [14], KITTI data [15] are used as stereo images to estimate the disparity map derived from the left image, which is consistent with the right image. The right image requires the disparity map estimated on the left image to calculate the error. As this process repeats, it is possible to create an image in the opposite direction, which allows the creation of stereo images and depth information.

Monodepth2 [9] is a follow-up to the Monodepth method [14]. It uses the characteristics of the KITTI dataset [15], which was constructed using consecutive images captured by a moving car. In Monodepth2, the results can be corrected through reprojection. This method leads to fewer errors because of the creation of stereo images. The Densedepth method [10] uses layers consisting of an encoder–decoder structure, which are interconnected using skip connections. In this method, KITTI data [15] and NYU Depth V2 data [16] can be used to estimate depth information for both indoor and outdoor images.

##### Table 1. Parameters of depth estimation networks.
 Parameter Monodepth2 Densedepth Training dataset KITTI dataset NYU2 depth dataset KITTI dataset Batch size 12 4 Epoch 20 20 Learning rate 0.0001 0.0001 Min depth 0.1 10 Max depth 100.0 1000 Optimizer Adam Adam

## 3. The Proposed Method

The proposed method generates a transmission map indirectly from a depth map, which can be generated by using previous depth estimation networks based on deep learning. Then, we obtain a dehazed image using a transmission map based on an atmospheric scattering model. Fig. 2 shows a sequence diagram of the proposed method.

As shown in Fig. 2, the depth map is estimated before the estimation of the transmission map. For the depth map, a training process is performed using depth estimation networks based on deep learning. After the training process is complete, a depth estimation model is obtained. Once the depth estimation model is obtained, image dehazing can be performed on a hazy image. For a hazy input image, we estimate the depth map using the depth estimation model. The transmission map is estimated from the depth map using the relationship described in Eq. (3). Finally, we obtain the dehazed image using the atmospheric scattering model described in Eq. (2).

Two methods were tested for the depth estimation model. The first method is Monodepth2 [9], which allows the correction of loss values in learning by applying additional information to the network for the depth estimation. The second method is Densedepth [10], which uses transfer learning with both indoor and outdoor image data.

The network structure of Monodepth2 method is shown in Fig. 3. The depth network is based on the U-Net structure, which enables the prediction of the overall depth information. The pose network assists in predicting the depth information from the movement of objects that are in the front and rear image frames. Using the information in the pose networks, the networks adjust the parameters to generate a depth map. For this method, training is conducted using the KITTI datasets [15], which include mono images, stereo images, and mono and stereo images.

Fig. 4 presents the detailed network structure of the Densedepth method [10]. This network was originally applied for image classification, and the encoder–decoder structure method was used to estimate the depth. The training of this method uses the KITTI data and NYU2 depth data. The NYU2 depth data are indoor data, and the KITTI data are outdoor data.

## 4. Experimental Results

Table 1 presents the parameters used for training Monodepth2 and Densedepth in the experiments.

### 4.1 Results of Depth Estimation Networks

Fig. 5 shows the experimental results using Monodepth2. Fig. 5(a) shows a test image of the Berkeley dataset [17] as the input. Figs. 5(b)-(d) show the resulting depth maps using mono images as the training data. Figs. 5(b) and (c) show the results of training with image sizes of 640 ${\times}$ 192 and 1024 ${\times}$ 320, respectively.

Fig. 5(d) shows the result of training with an image size of 640 ${\times}$ 192, as shown in Fig. 5(b) without applying the pose network. Figs. 5(e)-(g) are the resulting depth map images using stereo images as the training data. Figs. 5(e) and (f) show the results of training with image sizes of 640 ${\times}$ 192 and 1024 ${\times}$ 320, respectively. Fig. 5(g) shows the resulting depth map without applying the pose network with a size of 640 ${\times}$ 192, which is same as the size in Fig. 5(e).

Figs. 5(h)-(j) show the resulting depth maps using both mono and stereo images as the training data. Figs. 5(h) and (i) show the results of training with sizes of 640 ${\times}$ 192 and 1024 ${\times}$ 320, respectively. Fig. 5(j) show the result of training without applying the pose network with a size of 640${\times}$192, which is the same as the size in Fig. 5(h). There is a tendency for small objects to be perceived at farther distances and for large objects to be perceived at nearer distances. If the pose network is not applied, the overall depth outlines are blurred, and the depth estimation values become less accurate.

Fig. 6 shows the experimental results obtained using Densedepth. The encoder part of Densedepth network was set as DenseNet-169 [21] for the experiments. We compared the results of depth estimation for indoor and outdoor images using the NYU2 depth dataset and KITTI dataset.

Fig. 6(a) shows the indoor image data [16] used as the input for Figs. 6(b) and (c). Fig. 6(d) shows the outdoor image data [19] used as the input for Figs. 6(e) and (f). Figs. 6(b) and (e) show the depth map images obtained by training using the NYU2 depth dataset. Figs. 6(c) and (f) show the depth map images obtained by training using the KITTI dataset.

The NYU2 depth dataset provides indoor image data, and the KITTI dataset provides outdoor image data, so the resulting depth maps are different. With the NYU2 depth dataset, the results preserve more edges of objects. The results obtained with the NYU2 depth dataset show more detailed depth results than those obtained with the KITTI dataset for indoor images. For the outdoor images, the results obtained with the NYU2 depth dataset cannot predict the overall depth map, while the results with the KITTI dataset can predict the depth map more evenly.

### 4.2 Transmission Map and Dehazed Image Obtained using the Proposed Method

For depth estimation, we used previously described depth estimation networks [9,10]. Both networks were implemented using TensorFlow codes. For the test, unannotated real-world hazy images were used [18]. Haze removal experiments were carried out by converting the depth map into the transmission map using the relationship described in Eq. (3). Figs. 7 and 8 show the process of image dehazing using the proposed method.

Figs. 7(a) and 8(a) show the input hazy images [18]. Figs. 7(b) and 8(b) show the depth maps from the input images using the depth estimation model with the depth estimation network. Figs. 7(c) and 8(c) show the visualized transmission maps from the depth map. Figs. 7(d) and 8(d) show the dehazed images after the haze removal is carried out from the transmission map using the atmospheric scattering model described in Eq. (2). In these results, the depth value becomes lower for nearby objects, and the transmission value become higher. In addition, objects at farther distances changed more.

### 4.3 Result Comparison of Proposed Method and DehazeNet

DehazeNet [6] directly generates a transmission map from a hazy image using the CNNs. Comparisons of the results for hazy natural outdoor images for the proposed method and the DehazeNet method are shown in Figs. 9-11. Figs. 9(a)-(c) show the hazy images obtained using Bdd100k [17]. Figs. 10(a)-(c) show the dehazed images obtained using the DehazeNet method. Figs. 11(a)-(c) show the dehazed images obtained using the proposed method.

In the DehazeNet method, haze effects are sufficiently removed, changes in the intensity levels are high, and the roads are excessively darkened from recognizing the bright parts of the road as haze. In the results obtained using the proposed method, the haze is removed more evenly, and the road parts are dehazed correctly while preserving the intensity levels without any darkening effects.

We also performed experiments using the synthetic objective testing set (SOTS) [18] for the comparison of image dehazing results by DehazeNet and the proposed method with Monodepth2 and Densedepth. Figs. 12 and 13 show the results with the PSNR values, which were calculated using the groundtruth images of SOTS. The figures show that the proposed method gives better PSNR results for image dehazing than the DehazeNet method. The dehazed images from DehazeNet are darker than those from the proposed method.

### 4.4 Comparison of the Results with Respect to β

The degree of change in the transmission map by depth value can be adjusted using the scattering coefficient β in Eq. (3). Figs. 14-18 show the results obtained with various β values (0.2, 0.4, 0.6, 0.8, 1.0, and 1.2, respectively). As β increases, the degree of change in the transmission map by depth value becomes higher. In this case, there is more difference in these values for the dehazed image compared to the input hazy image. Conversely, as β decreases, the degree of change in the transmission map by depth value becomes lower.

In these results, it is observed that the β value of 1 results in more appropriate dehazed images. Depending on the characteristics of the original hazy image, a disadvantage of creating an excessively dark area appears when β becomes high in high-contrast areas such as shadows. If the estimated depth information does not match well with the original image, a high β value may result in artifacts due to the errors in depth information. If the depth information is somewhat similar to the original image, a higher β value provides better dehazing effects. When applying the depth estimation network trained with the KITTI dataset, the dehazing process was performed relatively well for the road environment. The dehazing effects were relatively low when there were buildings on both sides without any vehicles, as shown in Figs. 15 and 16.

## 5. Conclusion

In this paper, we proposed a novel technique for image dehazing by indirectly creating a transmission map through the estimation of the depth map as opposed to direct estimation of the transmission map in previous image dehazing methods. The dehazing results using the proposed method were superior to those of previous methods that generate the transmission map directly with post-processing from the input image. However, the proposed method has limitations. First, the dataset covering the road environment in daylight could provide incorrect results for depth map estimation. Second, it is necessary to set the value of atmospheric light adaptively as each test set needs the appropriate atmospheric light value to be estimated for image dehazing. Future research should be directed to resolve these issues.

### ACKNOWLEDGMENTS

This research was supported by the MSIT (Ministry of Science, ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2020-2016-0-00465) and supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation). This work was also supported by the National Foundation of Korea (NRF) grant, which is funded by the Korean government (MIST) (NRF-2019R1H1A2079873).

### REFERENCES

1
Narasimhan S. G., Nayar S. K., Jul. 2002, Vision and the atmosphere, Int. J. Comput. Vision, Vol. 48, No. 3, pp. 233-254
2
Jingkun Z., Sep. 2015, Analysis of causes and hazards of China’s frequent hazy weather, The Open Cybernetics & Systemics Journal, Vol. 9, pp. 1311-1314
3
Yan Z., Zhang H., Wang B., Paris S., Yu. Y., 2016, Automatic photo adjustment using deep learning, ACM Trans. Graphics, Vol. 35, No. 2, pp. 11
4
Cowan C. K., Kovesi P. D., May. 1988, Automatic sensor placement from vision task requirements, IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 10, No. 3, pp. 407-416
5
Lee S., Maik V., Jang J., Shin J., Paik J., May. 2005, Noise-adaptive spatio-temporal filter for real-time noise removal in low light level images, IEEE Trans. Consumer Electronics, Vol. 51, No. 2, pp. 648-653
6
Cai B., Xu X., Jia K., Qing C., Tao D., Jan. 2016, DehazeNet: an end-to-end system for single image haze removal, IEEE Trans. Image Processing, Vol. 25, No. 11, pp. 5187-5198
7
Li B., Peng X., Wang Z., Xu J-Z., Feng D., 2017, AOD-Net: all-in-one dehazing Network, IEEE Int. Conf. Computer Vision, pp. 4770-4778
8
Xu Qin , Zhilin Wang , Yuanchao Bai , Xiaodong Xie , Huizhu Jia , 2019, FFA-Net: Feature fusion attention network for single image dehazing, arXiv preprint arXiv:1911.07559
9
Godard C., Aodha O. M., Brostow. G. J., 2019, Digging into self-supervised monocular depth estimation, IEEE Int. Conf. Computer Vision
10
Alhashim I., Wonka. P., 2018, High quality monocular depth estimation via transfer learning, arXiv e-prints, abs/1812.11941
11
McCartney E. J., 1976, Optics of the atmosphere: Scattering by molecules and particles, New York, NY, USA: Wiley
12
He K., Sun J., Tang X., Dec. 2011, Single image haze removal using dark channel prior, IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 33, No. 12, pp. 2341-2353
13
Zhang H., Patel V. M., 2018, Densely connected pyramid dehazing network, IEEE Int. Conf. Computer Vision Pattern Recognition, pp. 3194-3203
14
Godard C., Aodha O. M., Brostow G. J., 2017, Unsupervised monocular depth estimation with left-right consistency, IEEE Int. Conf. Computer Vision Pattern Recognition, pp. 270-279
15
Geiger A., Lenz P., Stiller C., Urtasun R., Sep. 2013, Vision meets robotics: The KITTI dataset, International Journal of Robotics Research, Vol. 32
16
Silberman N., Hoiem D., Kohli P., Fergus R., 2012, Indoor segmentation and support inference from rgbd images, European Conf. Computer Vision
17
Yu F., Chen H., Wang X., Xian W., Chen Y., Liu F., Madhavan V., Darrell T., 2020, Bdd100k: a diverse driving dataset for heterogeneous multitask learning, IEEE Conf. Computer Vision Pattern Recognition
18
Li B., Ren W., D.Fu , Tao D., Feng D., Zeng W., Wang. Z., Aug. 2019, Benchmarking single image dehazing and beyond, IEEE Transactions on Image Processing, Vol. 28, No. 1, pp. 492-505
19
Pal D., Arora A., 2018, Removal of fog effect from highly foggy images using depth estimation and fuzzy contrast enhancement method, International Conference on Computing Communication and Automation, pp. 1-6
20
Jiwani M. A., Dandare S. N., Jun 2013, Single image fog removal using depth estimation based on blur estimation, International Journal of Scientific and Research Publications, Vol. 3, No. 6, pp. 1-6
21
Huang G., Liu Z., Maaten L., Weinberger K. Q., 2017, Densely connected convolutional networks, IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269

## Author

##### Yejin Kim

Yejin Kim received a BSc in Software from Konkuk University, Korea, in 2018. Currently, she is a graduate student at the Department of Software at Konkuk University and a researcher in the Intelligent Image Processing Laboratory. Her research interests include image dehazing via deep learning.

##### Changhoon Yim

Changhoon Yim received a BSc from the Department of Control and Instrumentation Engineering, Seoul National University, Korea, in 1986, an MSc in Electrical and Electronics Engineering from the Korea Advanced Institute of Science and Technology in 1988, and a PhD in Electrical and Computer Engineering from the University of Texas at Austin in 1996. He worked as a research engineer at the Korean Broadcasting System from 1988 to 1991. From 1996 to 1999, he was a member of the technical staff in the HDTV and Multimedia Division, Sarnoff Corporation, New Jersey, USA. From 1999 to 2000, he worked at Bell Labs, Lucent Technologies, New Jersey, USA. From 2000 to 2002, he was a Software Engineer at KLA-Tencor Corporation, California, USA. From 2002 to 2003, he was a Principal Engineer at Samsung Electronics, Suwon, Korea. Since 2003, he has been a faculty member and is currently a professor in the Department of Computer Science and Engineering, Konkuk University, Seoul, Korea. His research interests include digital image processing, video processing, multimedia communication, and deep learning.