Original Source Here
Single Blind Image Super-Resolution Using Deep Learning
In this story, I talk about USRNet+DAN named super resolution model. In this work was used estimator of DAN model, and USRNet model. With single trained network was made two different task : Blur Kernel Estimation and Single image super-resolution.
- USRNet+DAN Network Architecture
1. USRNet+DAN Network Architecture
Basic USRNet model was used in this study. For this reason, it is necessary to describe the modules of the USRNet model first. Then the Estimator module used will be explained.
1.1. Unfolding Optimization
The Maximum a posterior framework states that the HR image will be obtained by minimizing the following energy function,
In Eq. (1), 1/2𝜎²||y − (x ⊗ k)↓s)||² represents the data term, Φ(x) represents the prior term, and λ represents the trade-off parameter. The half quadratic splitting (HQS) algorithm was used in the USRNet model in order to make Eq. (1) an unfolding state. This is because the HQS algorithm provides simplicity and fast convergence. The HQS algorithm uses Eq. (1) by adding z variable,
Here z used by the HQS algorithm is an auxiliary variable and µ is a penalty parameter. Such an equation can yield a result by iteratively solving the subproblems of x and z.
According to Eq. (3), µ should be large enough so that x and z are approximately equal to the fixed point. This leads to slow convergence. But one of the reasons we use the HQS algorithm is fast convergence. In USRNet, µ is incremented iteratively to overcome slow convergence.
Here the term data is used in Eq. (3). Prior term is used in Eq. (4). Fast Fourier transform (FFT) can be used to solve Eq. (3), assuming that the convolution is performed with circular boundary conditions. USRNet use the torch.rfft expression provided by Pytorch to perform the FFT on code. Notably, it has a closed-form expression,
Here, the equation representing the variable d is,
1.2. Deep Unfolding Network
Unfolding optimization is explained in section 1.1. After the unfolding optimization was determined, USRNet was designed. Unfolding optimization proposes to solve Eq.(3) and Eq. (4) iteratively. Eq. (3) represents the data subproblem. Eq. (4) represents a prior subproblem. Since an iterative solution is proposed, the image should be iteratively processed between the data module and the prior module. It also includes a Hyper-parameter module to control the outputs of the USRNet data module and prior module.
1.2.1. Data Module
Eq. (5) represents the closed form solution of the data subproblem. The Data module is the USRNet module that realizes this solution. The purpose of this module is to obtain an HR image by minimizing the weighted combinations of the ||y − (z ⊗ k)↓s||² data term and ||z-x
k-1||² the quadratic regularization term with the output of the hyperparameter model α
k. Thus, this module is a module used for cleaning the LR image. The data module is a module that corresponds to the degradation model (i.e. y = (x ⊗ k) ↓s + n). Data module only takes scale factor and blur kernel as input. Taking scale factor and blur kernel as input gives this module an advantage. This multi-input module was designed manually by USRNet. Thus, Eq. (5) has been shortened to,
The reason why the data model takes only scale factor and blur as input to the kernel is that the data term represents the degradation model. In addition, the Data Module helps the solution by limiting the degradation. The x0 image is formed as a result of interpolation of the y image with the scale factor. The simplest nearest neighbor interpolation is used. In addition, the data module does not contain trainable parameters. This case to a better generalizability due to the complete decoupling with the prior module. In many experiments, instead of the x0 image created by interpolation, the x0 image created by filtering with a super resolution model or denoiser a model is given as input to the data module. The purpose here is to make the image cleaner and give it to the Data module, which is the basic module of USRNet. Thus, it is aimed that the model will be more successful even if the image suffers more degradation as a result of the effects of scale factor s, blur kernel k or noise level n. torch.rfft provided by PyTorch is used to implement the main FFT operator. In addition, torch.irfft provided by PyTorch is used to implement the inverse FFT operator.
1.2.2. Prior Module
The purpose of the prior module is to get a cleaner HR image. The Prior module takes as input the output of the data module and an output of the hyper-parameter module, βk. Here βk represents the noise level map. The zk is the output of the data module. The ResUNet model, which takes the noise level map and data module output as input, can thus work with different noise levels by reducing the number of parameters. In addition, the ResUNet model is a deep CNN structure. The following equation can be represented as the equation of this module:
The denoiser model used by USRNet, namely the ResUNet model, is an extremely successful denoiser model. It has been seen in models such as EDSR, MDSR or VDSR that residual layers give successful results in deep networks. On top of that, the ResUNet model was created by adding residual layers to the successful U-Net model. By using the Prior module and the data module iteratively, we obtain the USRNet design. Thanks to this iteration, the LR image goes through a better cleaning process and thus the HR image is the result of this iterative structure as a cleaner image. The xk is the output of the prior model. It is used as an input to the data module in iterative structure. ResUNet is a model with four different scales. The identity skip connection is used during downscaling and upscaling operations. The number of channels in the layer in the first scale is 64. The number of channels in the layer in the second scale is 128. The number of channels in the layer in the third scale is 256. The number of channels in the layer in the last scale is 512. In this way, different layer numbers were used in each scale layer. In addition, strided convolution was used in downscaling operations. Transposed convolution was used in upscaling operations. Strided convolution and transposed convolution sizes are 2×2. Each residual block is used to consist of two 3×3 convolution layers with ReLU activation in the middle. In addition, 2 residual blocks are used for each scale. This situation is the same for downscaling and upscaling cases.
1.2.3. Hyper-parameter Module
This module is a module designed to control the data module and prior module outputs. It acts as a slide bar that controls and slides the outputs. αk, which is input for the data module, and βk, which is the input for the prior module, are given to other modules as outputs of the hyper-parameter module. For the formation of βK and αk values, the hyper-parameter module takes scale factor s and noise level σ values as inputs and produces αk and βk outputs as a result. The equation of this module is as follows,
This model performs the slide bar function as follows: α = [α1, α2,. . ., αK] and β = [β1, β2, . . . , βK]. The hyperparameter module consists of three layers, fully connected with ReLU as the first activation function and finally Softplus activation function. The number of hidden nodes in each layer is 64.
1.3. Estimator Module
It is a module used in blind super resolution problem and taken from Deep alternating network (DAN) study. Blind super resolution study was obtained by combining this module with other modules of the USRNet study, which was very successful in non blind studies. Estimator module takes both LR image and SR image as input. Thanks to this situation, blur kernel estimation becomes much easier.
The working principle of the Estimator module is as follows; the HR image given to the module is downsampled with the convolutional layer in line with the specified scale factor. Then, the features obtained from the LR image and HR image are given to the body part of the model consisting of CRB layers. Finally, we aggregate the features using global average pooling to generate the elements of the predicted kernel. However, it should not be forgotten that the estimator should estimate the pca (principal component analysis) value and give it to the restorer (i.e. to the USRNet model). This is due to the kernel being reduced by pca. The proposed estimator model has 5 CRBs. The inputs of the CRB, namely the basic input and the conditional input, have 32 channels. The input of the CRB body is the output of concatenation the basic input and the conditional input. CRB body is used to consist of two 3×3 convolution layers with LeakyReLU activation function in the middle. The output of the convolution layer enters one channel attention layer, and fbasic is added to the final output and the output of the CRB is obtained. Here the body of the CRB uses as the residual mapping function.
There are two different models used in our network. One is the estimator model that can do blur kernel estimation, and the other is the USRNet model that restores the image. Estimator model takes LR image and SR image as input. The USRNet model takes as input the LR image, the estimated blur kernel, scale factor and noise level. The estimation blur kernel, the input to USRNet, changes constantly during iterations. Likewise, the SR image, which is the input of the Estimator model, is updated in during iterations. The relationship between estimator and USRNet is very important in the proposed model. The key issue here is that if the estimator model works independently of the SR image or the restorer model works independently of the blur kernel input, they will remain in a stable state without updates during iterations. This will be a big problem because the results will always stay the same during iteration. Therefore, the conditional residual block (CRB) is used for the relationship of the estimator and USRNet models. The equation for using the CRB is,
Here R(·) specifies the residual matching function of the CRB and Concat([·, ·]) specifies the join. fbasic and fcond are inputs that need to not change and change at iteration, respectively. The Estimator model is created by CRBs. Thus, the blur kernel predicted by the estimator and changing with each iteration enters the USRNet model. In addition, this problem is overcome by using the SR image that changes in each iteration by the restorer in the estimator model.
Experiments were performed using different scale factors and different blur kernels. In the first experiment was separately trained by the x2 and x4 scale factors of the DAN+USRNet model. The size of the blur kernels used in the training was fixed as 21×21 and the randomly generated gaussian blur kernels was used in the training. In the other experiment was separately trained by the x2, x3, and x4 scale factors of the DAN+USRNet model. In the training, 100.000 different gaussian blur kernels and 100.000 motion blur kernels with kernel sizes of 21×21 used.
2.1. Training and Testing Information
DIV2K, a RGB image set with a wide variety of content, was used as a train dataset. DIV2K dataset is a widely used dataset in super-resolution studies. The DIV2K dataset consists of 800 high resolution images. In order to increase the number of data and to reduce the overfitting, data augmentation method was used while preparing the dataset. While preparing the data, LR images were obtained from DIV2K HR images by (y = (x ⊗ k) ↓s + n). While preparing the data, care was taken to obtain LR images using different blur kernels and different noise levels. Therefore, images were filtered with randomly prepared different blur kernels while preparing the dataset. The images were degraded with random different noise levels. In this way, our models were trained with different blur kernels and different noise level effects. Of course, the same method was applied for scale factors. Blur kernel sizes were determined as 25×25 in non-blind super resolution studies. In blind super resolution studies, it was determined as 21×21.
We choose the widely used color BSD68 dataset to quantitatively evaluate our works. The dataset consists of 68 images with small structures and fine textures. In order to obtain LR images from the relevant test set via (y = (x ⊗ k) ↓s + n), we need to provide blur kernel and noise levels. To evaluate the success of our work in different blur kernel effects, we considered 12 representations and various blur kernels. It contains 12 blur kernels, 4 isotropic Gaussian kernels, 4 anisotropic Gaussian kernels, and 4 motion blur kernels.
The test dataset used for the results of the above table is BSD68. In the training, 100000 different gaussian blur kernels and 100000 different motion blur kernels in 21×21 size were used. For the test, 12 different blur kernels were used.
The test dataset used for the results of the above table is BSD68. In the training, randomly generated gaussian blur kernels with the size of 21×21 were used. 12 different blur kernels were used for testing.
Although a lot of research has been done in this study, I recommend reading “https://arxiv.org/pdf/2010.02631.pdf” and “https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Deep_Unfolding_Network_for_Image_Super-Resolution_CVPR_2020_paper.pdf“. USRNet+DAN is a successful study in the field of blind super resolution. You can find the source code at “https://github.com/BoraCoban“.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot