Many studies on image deblurring have been conducted, and deep learning methods for blind image deblurring have received considerable attention due to their good performance. Recently, the SelfDeblur method was proposed for blind image deblurring based on deep image prior (DIP). In the SelfDeblur method, two neural networks for an image generator and a blur kernel generator are learned simultaneously with only one blurry image. This shows the feasibility of blind image deblurring using unsupervised learning, since it requires no training process. In this paper, we propose a method to maximize the performance of blind image deblurring based on DIP. The optimal loss function for deep learning is studied for the SelfDeblur method, and the deblurring performance of the proposed method is stabilized and maximized using the image prior and the kernel prior for the total loss function. Extensive computer simulations show that the proposed method yields superior performance compared to conventional methods.

※ The user interface design of www.jsts.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

### Journal Search

## 1. Introduction

Various studies have been conducted to develop image deblurring methods to restore
a blurry image, denoising methods to remove random noise in images, and image inpainting
methods to fill in damaged or missing parts of images ^{[1-}^{30]}. Image deblurring methods have been studied to restore images that are damaged by
various blurring factors, including motion blur, which is caused by unstable motion
when acquiring images ^{[1]}. In the image deblurring method, the damaged image signal $y$ can be expressed by
the following equation:

in which an original image signal $x$ is convolved with a specific blur kernel $h$,
and after the convolution, additive noise $n$ is added. Deblurring is also regarded
as deconvolution in the sense that it removes the effect of convolution in Eq. (1) to remove the degradation caused by the blur kernel $h$ ^{[4]}. Image deblurring method include non-blind and blind image deblurring methods. In
non-blind methods, a blurry image is restored assuming that the blur kernel is known.
In blind image deblurring methods, a blurry image is restored assuming that the blur
kernel is not known ^{[22,}^{26]}. Blind image deblurring is a severely ill-posed problem that requires estimating
both the blur kernel and the clean image, and prior knowledge of the natural image
and the blur kernel can be used to find a solution ^{[23]}.

For image deblurring, iterative algorithms based on the maximum-a-posteriori (MAP)
framework have been studied ^{[8]}, while deblurring methods using deep learning have been attracting much attention
due to their good performance ^{[16-}^{30]}. Most deep learning methods for image deblurring are based on supervised learning,
and two kinds of deep learning methods for image deblurring are studied. The first
method is to obtain the blur kernel first, and then restore the blurry image. The
second method is to restore the clean image using the end-to-end technique, without
first obtaining the blur kernel ^{[22]}. Recently, SelfDeblur was proposed for blind image deblurring based on DIP ^{[28,}^{29]}. In this system, two neural networks for an image generator and a blur kernel generator
are connected in series and simultaneously learned using only one blurry image. Since
a training process is not required in this system, it shows the feasibility of blind
image deblurring using unsupervised learning.

In this paper, a blind image deblurring method based on DIP is studied. We analyze the performance of SelfDeblur, and propose a method to maximize the deblurring performance. For this purpose, we analyze various loss functions for deep learning of SelfDeblur, and find the optimal loss function. Then, we propose the following methods to stabilize and maximize the performance of the system. The entire learning iterations are divided, and different total loss functions are used in each interval. In the first interval of learning iterations, the mean error is used for initial learning. Then, in the second interval of learning iterations, the image prior and the kernel prior are used together for the total loss to help with the stable learning. In the third interval of learning iterations, the learning is performed using the total loss function, which is composed of the mean error and the difference of structural characteristics for the image, without using the image prior and the kernel prior to maximize the deblurring performance. Extensive computer simulations are performed using various test images, and it is shown that the proposed method yields superior deblurring performance compared to the conventional methods.

## 2. Blind Image Deblurring Methods

A blurry image signal can be expressed as Eq. (1). The blind image deblurring, in which an original image $x$ is restored from a blurry
image $y$ without knowing the blur kernel $h$, is a severely ill-posed problem for
which many solutions exist. A number of image deblurring methods have been studied
based on the following MAP framework ^{[23]}:

##### (2)

$$ \begin{aligned} (\hat{x}, \hat{h}) &=\arg \max _{(x, h)} \operatorname{Pr}(x, h \mid y) \\ &=\arg \max _{(x, h)} \operatorname{Pr}(y \mid x, h) \operatorname{Pr}(x) \operatorname{Pr}(h) \end{aligned} $$As this equation shows, blind image deblurring can be viewed as a problem of finding estimates $\hat{x}$ and $\hat{h}$ that maximize the probabilities of $x$ and $h$, given the blurry image $y$. $\Pr \left(y|x,h\right)$ represents the data fidelity as the likelihood of the blurry image $y$, given $\left(x,h\right)$. $\Pr \left(x\right)$ and $\Pr \left(h\right)$ represent the statistical characteristics of the natural image and the blur kernel, which can be expressed as the image prior and the blur kernel prior, respectively. This MAP-based technique is usually implemented by an iterative algorithm.

Various methods have been proposed to solve the blind image deblurring problem. Levin
et al. studied a method to complement the MAP-based approach ^{[8]}, while Cho et al. proposed a fast motion deblurring method using an efficient prediction
step and image derivatives ^{[1]}. Sun et al. proposed a method for kernel estimation and image deblurring from a single
image via modeling image edge primitives using patch priors ^{[10]}, while Zuo et al. proposed an iteration-wise MAP framework for blind deconvolution
by using the generalized shrinkage-thresholding operator ^{[12]}. Perrone et al. proposed a total variation blind deconvolution method using sparse
gradient priors via total variation ^{[13]}, while Pan et al. proposed an effective blind image deblurring algorithm based on
an analysis of the convolution operation and its effect on the dark channel of blurred
images ^{[3]}. These methods are usually implemented using iterative algorithms. Recently, deep
learning techniques have been actively studied to improve the image deblurring performance.
Xu et al. studied a technique using a convolutional neural network (CNN) ^{[16]}, while Yan et al. studied a deblurring technique that connects classification and
regression networks ^{[17]}. Zhang et al. studied a technique using CNN and a recurrent neural network (RNN)
^{[18]}, while Tao et al. proposed a network using an encoder-decoder structure ^{[19]}.

A supervised learning method that requires a large number of training images is used
for most image deblurring methods based on deep learning. However, the deblurring
performance depends on the training images used in the supervised learning, and in
some cases, a large number of training images may be unavailable. On the other hand,
Ren et al. proposed SelfDeblur, which can be viewed as an unsupervised learning method
based on DIP for blind image deblurring ^{[28,}^{29]}. In SelfDeblur, two neural networks, which generate a clean image and a blur kernel,
respectively, are connected in series and both neural networks are learned simultaneously
with only one blurry image. This study is important because it proves the possibility
that an unsupervised learning method without a training process can be successfully
applied to blind image deblurring.

## 3. Blind Image Deblurring Method based on Deep Image Prior

With the DIP method, it has been demonstrated that a deep neural network can be learned
to generate a clean image with only one damaged image due to the structural characteristics
of the deep neural network ^{[30]}. The deep neural network can be learned to generate a clean image that is close to
the original image, if the loss is minimized between the damaged image and the output
of the encoder-decoder structure with the skip connection shown in Fig. 1 ^{[28-}^{30]}. It has been shown that DIP can be applied to image restoration problems such as
inpainting, denoising, and deblurring ^{[30]}. If the loss is minimized between the blurry image and the convolution of the blur
kernel and the image generator output, the image generator output can be learned to
create a clean image. In this case, the DIP plays the role of non-blind deblurring,
since it is necessary to know the blur kernel.

Ren et al. proposed SelfDeblur, which is a blind image deblurring method based on
DIP ^{[28,}^{29]}. In SelfDeblur, the deep neural network for generating a clean image is implemented
using an autoencoder structure with skip connections used in DIP as shown in Fig. 1. The neural network for the blur kernel generator is implemented using a fully connected
network, as shown in Fig. 2 ^{[28,}^{29]}. A clean image generator and a blur kernel generator can be obtained if these two
neural networks are connected in series and the whole system is learned with only
one blurry image, as shown in Fig. 3. Since this method does not require a training process, it shows the feasibility
of unsupervised learning for blind image deblurring. However, since two neural networks
are learned with only one blurry image in this method, the learning stability may
be lower compared to that of supervised learning methods. Thus, developing a stable
and optimized learning method is very important for this system.

## 4. Proposed Blind Image Deblurring Method Based on Deep Image Prior

If the MAP framework of Eq. (2) is formulated in the log domain, it can be implemented as the following optimization
problem ^{[28]}:

##### (3)

$$ (\hat{x}, \hat{h})=\arg \min _{\left(G_{x}, G_{h}\right)}\left\{\begin{aligned} d\left(G_{x}\left(z_{x}\right) *\right.&\left.G_{h}\left(z_{h}\right), y\right)+\alpha \cdot \phi\left(G_{x}\left(z_{x}\right)\right) \\ &+\beta \cdot \varphi\left(G_{h}\left(z_{h}\right)\right) \end{aligned}\right\}, $$ s.t. $0 \leq\left(G_{x}\left(z_{x}\right)\right)_{i} \leq 1, \forall i,\left(G_{h}\left(z_{h}\right)\right)_{j} \geq 0, \forall j, \sum_{j}\left(G_{h}\left(z_{h}\right)\right)_{j}=1$.In Eq. (3), $G_{x}\left(z_{x}\right)$ and $G_{h}\left(z_{h}\right)$ represent the outputs of
the image generator and the blur kernel generator shown in Fig. 3, respectively. Fig. 2 shows that softmax nonlinearity is applied to the kernel output layer to make the
sum of the coefficients be 1 ^{[28,}^{29]}. To represent the data fidelity between $G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right)$
and $y,$ $d\left(G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right),y\right)$ is used.
The $l_{k}$ norm of the error, which can be represented as $G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right)-y^{k},$
is usually used for $d\left(G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right),y\right)\,.$
$\phi \left(G_{x}\left(z_{x}\right)\right)$ and $\varphi (G_{h}\left(z_{h}\right)$)
are the image prior and the kernel prior, respectively, and are used as regularizers
to supplement the data fidelity. The weights of each prior, which are denoted as ${\alpha}$
and ${\beta}$, can be adjusted. When $G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right)-y^{k}$
is used for $d\left(G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right),y\right)$, $l_{1}$
norm of error ($k=1$), which represents the mean average error (MAE), or $l_{2}$ norm
of error ($k=2$), which represents the mean square error (MSE), is usually used. The
structural similarity measure (SSIM) ^{[31]}, which represents the structural similarity of images, can also be used to calculate
$d\left(G_{x}\left(z_{x}\right)*G_{h}\left(z_{h}\right),y\right)$. Although MSE is
generally used as a loss function, it is proportional to the square of the error and
is more affected by outliers than MAE. While 1${-}$SSIM can show the difference of
structural similarity, it is not accurately proportional to the mean error.

In this paper, we study a method to stabilize and maximize the deblurring performance for the system shown in Fig. 3. First, we find the optimal loss function when the priors are not considered. The experimental results presented in Section 5 show that MAE is more suitable than MSE to calculate the mean error for this system. This is due to the fact that outliers can have a more serious impact on the learning process for this system, compared to their impact on the system based on normal supervised learning. The performance is also evaluated when 1${-}$SSIM is used as a loss function to consider the difference in the structural characteristics of the image. As shown in the experimental results of Section 5, it is found that the performance of the system is maximized when MAE and 1${-}$SSIM are used together as a loss function. This is because the mean error and the structural similarity of images are considered simultaneously.

Then, we analyze the performance of the system when the priors in Eq. (3) are used. The total variation, which is denoted as $TV\left(G_{x}\left(z_{x}\right)\right)$, can be used as the image prior $\phi \left(G_{x}\left(z_{x}\right)\right)$, while the energy function of the blur kernel, which is denoted as $G_{h}\left(z_{h}\right)^{2},~ $can be used as the blur kernel prior $\varphi \left(G_{h}\left(z_{h}\right)\right)$. Then Eq. (3) can be converted to the following equation [2, 5, 28]:

##### (4)

$$ \begin{gathered} (\hat{x}, \hat{h})=\arg \min _{\left(G_{x}, G_{h}\right)}\left\{d\left(G_{x}\left(z_{x}\right) * G_{h}\left(z_{h}\right), y\right)+\alpha \cdot T V\left(G_{x}\left(z_{x}\right)\right)\right. \\ \left.+\beta \cdot\left\|G_{h}\left(z_{h}\right)\right\|^{2}\right\} \end{gathered} $$The image sparsity in the gradient domain is considered in the total variation regularizer
^{[1,}^{2]}, and the use of the energy function as the kernel prior is theoretically based on
Tikhonov regularization ^{[1,}^{5]}. When the blur kernel network and the image generator network are learned simultaneously
using only one blurry image as shown in Fig. 3, the learning stability may be lower compared to the case of supervised learning.
To stabilize and maximize the learning performance of the system, we propose the following
learning method, in which the entire learning iterations are divided and different
total loss functions for deep learning are used in each interval. In the first interval
of learning iterations, a mean error is used as the total loss for initial learning.
Then, in the second interval of learning iterations, the image prior and the kernel
prior are used together for the total loss. Total variation regularization as the
image prior is used to focus on salient parts of the image and the energy regularization
of the blur kernel is used to help with the point spread characteristic of the blur
kernel during the second interval of learning iterations. The use of the image and
the kernel priors increases the convergence stability of learning. After stabilizing
the convergence of learning, the learning is performed using the total loss function
that is composed of MAE and the difference of structural characteristics for the image,
without using the image and the kernel priors to maximize the deblurring performance
in the third interval of learning iterations. Algorithm 1 summarizes the proposed
method for blind image deblurring based on DIP. The experimental results in Section
5 show that the proposed method yields superior deblurring performance compared to
the conventional methods.

## 5. Performance Analysis

To analyze the performance of the proposed method, extensive simulations were performed
using Pytorch ^{[28,}^{32]} on the dataset by Levin et al. ^{[8]} and Lai et al. ^{[9]}, which are usually used for the performance analysis of image deblurring systems.
Since the proposed method is an unsupervised learning method based on deep image prior,
no training process is required. The simulations were conducted using 32 blurry images
from the Levin data set and 100 blurry images from the Lai dataset. In the datasets,
32 or 100 blurry images are generated as test images by applying 8 or 4 different
blur kernels to 4 or 25 original images, respectively ^{[8,}^{9]}. In our simulations, learning was performed to minimize the total loss over 5,000
iterations using the Adam optimizer ^{[33]}. The step size started at $10^{-4}$, and was decreased by 1/2 at 2,000, 3,000, and
4,000 iterations.

First, the deblurring performance was analyzed when the various loss functions described in Section 4 were used for the learning, and the results are presented in Table 1. The table shows that the case of using both MAE and 1${-}$SSIM as data fidelity shows the best deblurring performance in terms of PSNR and SSIM. As explained in Section 4, we consider that the overall deblurring performance is improved by considering both the average error and the difference in the structural similarity of the image together. We consider that MAE is more appropriate than MSE as average error, since outliers can have a more serious impact on MSE in the learning process of this system compared to the case of normal supervised learning.

##### Table 1. Image deblurring performance for various loss functions (average for 32 images in the dataset of Levin et al.[8]; total 5,000 iterations; average error (MAE or MSE) is used during the initial 1,000 iterations for all loss functions).

Total loss |
PSNR |
SSIM |

MSE |
31.954 |
0.907 |

MAE |
32.326 |
0.914 |

1−SSIM |
32.772 |
0.915 |

MAE+(1−SSIM) |
32.848 |
0.915 |

Next, the performance was analyzed when the image prior and the kernel prior were used together for the total loss function. The learning iterations were divided into several distinct intervals, and the performance was evaluated using various loss functions for each interval. After extensive simulations, the selected performance analysis results are summarized in Table 2. The proposed method shows the best performance using the optimized weights for the image prior and the kernel prior in the second interval of learning. The total variation regularization of the image and the energy regularization of the blur kernel were used to focus on the salient parts of the image and to help with the point spread characteristic of the blur kernel in the second interval of learning, respectively. When the additive noise $n$ in Eq. (1) exists, additional performance improvement can be expected if the total variation is used as the image prior in the third learning interval. However, the exact noise level should be estimated to use the total variation as the image prior to reduce the additive noise $n$.

##### Table 2. Image deblurring performance for various multi-losses using priors (average for 32 images in the dataset of Levin et al.[8]; total 5,000 iterations).

##### Table 3. Image deblurring performance for various image deblurring methods (average for 32 images in the dataset of Levin et al.[8]).

Deblurring method |
PSNR |
SSIM |

Cho et al. |
30.566 |
0.897 |

Pan et al. |
32.691 |
0.928 |

Levin et al. |
31.089 |
0.915 |

Sun et al. |
32.991 |
0.933 |

Zuo et al. |
32.662 |
0.933 |

Krishnan et al. |
29.888 |
0.867 |

Ren et al. (SelfDeblur) |
33.068 |
0.931 |

Proposed method |
33.651 |
0.930 |

Next, to compare the performance of the proposed method with several state-of-the-art deblurring methods, the numerical results are presented in Tables 3 and 4, while the visual comparisons are given in Figs. 4-7. Tables 3 and 4 show that the proposed method is numerically superior to all conventional methods. Figs. 4-7 show that the overall images were restored better by the proposed method than by the conventional methods. In particular, the details restored by the proposed method were clearer than those restored by the conventional methods.

##### Fig. 4. Visual comparison on the dataset of Levin et al.[8](The eight photos are arranged in the following order from top left to bottom right.: ground-truth, blurry, Pan et al.[3], Levin et al.[8], Sun et al.[10], Zuo et al.[12], SelfDeblur[28], the proposed method).

##### Fig. 5. Visual comparison on the dataset of Levin et al.[8](The eight photos are arranged in the following order from top left to bottom right.: ground-truth, blurry, Pan et al.[3], Levin et al.[8], Sun et al.[10], Zuo et al.[12], SelfDeblur[28], the proposed method).

##### Fig. 6. Visual comparison on the dataset of Lai et al.[9](The eight photos are arranged in the following order from top left to bottom right.: ground-truth, blurry, Cho et al.[1], Pan et al.[3], Perrone et al .[13], Michaeli et al.[14], SelfDeblur[28], the proposed method).

##### Fig. 7. Visual comparison on the dataset of Lai et al.[9](The eight photos are arranged in the following order from top left to bottom right.: ground-truth, blurry, Cho et al.[1], Pan et al.[3], Perrone et al.[13], Michaeli et al.[14], SelfDeblur[28], the proposed method).

##### Table 4. Image deblurring performance for various image deblurring methods (average for 100 images in the dataset of Lai et al.[9]).

Deblurring method |
PSNR |
SSIM |

Cho et al. |
17.905 |
0.556 |

Pan et al. |
21.588 |
0.746 |

Xu et al. |
20.753 |
0.734 |

Perrone et al. |
19.806 |
0.699 |

Michaeli et al. |
19.398 |
0.591 |

Ren et al. (SelfDeblur) |
21.138 |
0.763 |

Proposed method |
21.792 |
0.742 |

## 6. Conclusion

In this paper, a deep learning method for blind image deblurring based on DIP was studied. Various loss functions and priors for deep learning were studied and evaluated for the system, in which a blur kernel generator and an image generator are learned with only one blurry image. Since stable learning is important for such a system, we proposed a learning method to stabilize and maximize the learning performance. In the proposed learning method, different loss functions were used for three distinct learning intervals. To stabilize and maximize the learning performance, the image and the kernel priors were used, and structural similarity was considered, together with the mean error for the total loss function. Extensive simulations showed that the proposed method produced superior image deblurring performance compared to the conventional methods. The proposed method significantly improves the convergence stability and maximizes the image deblurring performance for the image deblurring method based on DIP, which is a deep learning method based on unsupervised learning that is now in the initial stage of research.

### ACKNOWLEDGMENTS

This study was supported by Research Fund 2021 of The Catholic University of Korea and by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2017R 1D 1A 1B03030585).

### REFERENCES

## Author

Changwoo Lee received a BSc and an MSc in control and instrumentation engineering from Seoul National University. After receiving a PhD in image processing area from Seoul National University in 1996, he worked as a senior researcher at Samsung Electronics. He is currently a professor in the School of Information, Communications and Electronics Engineering, the Catholic University of Korea. His current research interests are image processing and deep learning.

Jinwon Choi is a student in the School of Information, Communications and Electronics Engineering, the Catholic University of Korea. His current research interests are image processing and deep learning.