Blind deblurring is a basic subject of computer vision and image processing. Motion image deblurring is divided into non blind deblurring and blind deblurring by whether to estimate the blur kernel. Blind deblurring is easy to produce motion artifacts because of the inaccurate estimation of the blur kernel. Non blind deblurring is the best choice for the current blurred image processing. The purpose of this paper is to further improve the definition of blurred image, restore the edge information of contour, and strengthen the repair of texture details. Based on the multi-scale convolution neural network, a multi-scale residual network is proposed, which can comprehensively extract image features, enhance image feature fusion, and constrain image generation by combining multi-scale loss function with anti loss function. The performance of the algorithm is evaluated by testing the peak signal to noise ratio (PSNR) structure similarity and restoration time of the generated image relative to the clear image. This algorithm improves the average PSNR on GOPRO testset, and reduces the recovery time accordingly. It can successfully recover the detail information lost due to motion blur. This algorithm has simple network structure, strong robustness and good restoration effect, and is suitable for dealing with various image degradation problems caused by motion blur.

Humans rely on the visual system to obtain a large amount of information. Studies have shown that about 70% of the information is obtained through the visual system. Therefore, the acquisition, processing and use of image information is particularly important. From the exploration of space 60 years ago, the importance of image restoration technology can be seen. At that time, the images sent back to the earth from space were affected by the imaging technology at that time, the shooting environment was not ideal, the relative movement between objects and the camera shake [

There are three main types of blur, Gaussian blur, defocus blur and motion blur. There are three types of blur: Gaussian blur, defocus blur and motion blur. Gaussian blur is caused by the Gaussian distribution of each pixel in the image, which is formed by the external diffusion and superposition. The center image is more blurred, and the edge image is more loose. Defocus blur is caused by different depth of field in the process of photographing, some or all of the objects are not in the plane of the imaging system, and there will be local or global defocus blur in the image. The defocus blur is mainly caused by the camera focusing inaccuracy, which leads to different degrees of degradation of objects in different depths of the image [

According to the nature of the blur kernel, it is divided into blind deblurring and non-blind deblurring. Non-blind deblurring results in artifacts in the image due to the deviation of the blur kernel estimation, and can only restore limited image blur. Blind deblurring does not rely on the estimation of the blur kernel and achieves end-to-end deblurring, but due to the illposed nature of blind deblurring, the details of the image are missing, Enhance the color saturation of the image to meet human visual needs. Therefore, this article will focus on restoring the contour edges of the image. A multi-scale residual module is added to the network, and different convolution kernel sizes are used to extract more image features through the information sharing of the shallow network and the deep network. Based on the inspiration of DeblurGAN [

There are many causes of image blurring. It may be affected by the resolution of the capture device, lighting conditions, atmospheric motion, and the photographer’s shooting level, etc., resulting in different degrees and types of blurring in the captured pictures. According to the different types of blur, blurred images can be divided into motion blur, defocus blur, Gaussian blur and so on. This article mainly analyzes the image degradation model of motion blur. Motion blur is the blur produced by the relative displacement between the device and the shooting object during the exposure time of the shooting device. There are many uncontrollable factors that cause image motion blur, such as sun exposure, camera shake, atmospheric movement, and so on. Motion blur image restoration can be widely used in various fields, traffic monitoring, medical imaging, target detection, etc. Therefore, restoring clear images is a hot spot in the field of computer vision today..

The degradation model of motion blur is shown below, b is a blurred image, k is a clear image, and l is a blur kernel, also called a point spread function. The blur kernel is a kind of convolution kernel. This convolution kernel will make the image produce special effects, n is additive noise. The research of this paper does not estimate the fuzzy kernel l, and directly outputs clear images from end to end.

Through the mathematical modeling and analysis of the motion blur image, the motion blur removal is to establish a corresponding mathematical model, extract information from the contaminated or distorted image signal, and restore a clear image along the inverse process of image degradation. This topic is to restore a clear image on the basis of blind deblurring. Blind defuzzification is the current mainstream technology. The principle is not to rely on fuzzy kernel estimation, and to adjust the weight parameters and loss function by constructing a neural network to achieve the effect of the convergence of the objective function. In the process of non-blind deblurring, false contours will appear due to the inaccurate estimation of the blur kernel, and a large amount of noise will be present in the image, which will bring great difficulties to the restoration of the image.

Fuzzy algorithms can be divided into non-blind deblurring and blind deblurring according to whether the fuzzy kernel is known. Non-blind deblurring is performed under the premise that the blur kernel is known, and a clear image is obtained by deconvolution of the blurred image and the blur kernel. Blind deblurring is performed under the premise that the blur kernel is unknown. The traditional blind deblurring method is generally divided into two steps. First, the blur kernel is estimated, and then the blur kernel is deconvolved on the blurred image to obtain a clear image. Fergus et al. [

The following figure

Generator network structure diagram

ResNet uses the input of one layer and the output of another layer as the output of a block. Assuming that x is the input of a block and one block is composed of two layers, then he first passes through a convolutional layer and activates relu to obtain F(x), and then the result of F(x) after the convolutional layer is added to the previous input x to obtain a result, and the result is activated by relu as the output of the block. For ordinary convolutional networks, our output is F(x), but in ResNet, our output is H(x) = F(x) + x. This changes the learning goal and turns the original learning into the goal the function is equal to a known constant value and changed to make the residual between the output and the input 0, which is the identity mapping. After the residual is introduced, the mapping is more sensitive to the change of the output. H (x) is regarded as an underlying mapping fitted by partially stacked layers (not necessarily all networks), where x is the input of these layers. Assuming that multiple nonlinear layers can approximate complex functions, this is equivalent to that these layers can approximate complex residual functions, for example, H (x) − x (assuming that the dimensions of input and output are the same). So we explicitly let these layers estimate a residual function: F (x): = H (x) − x instead of H (x). So the original function becomes: F (x) + X. Although these two forms should be able to approximate the required function (as assumed), the learning difficulty is not the same. The motivation of this re expression is caused by the abnormal phenomenon of degradation. If the added layer can be constructed by identity mapping, the training error rate of a deeper model should not be higher than that of its corresponding shallow model. The degeneracy problem shows that it may be difficult for the solver to estimate the identity map through multiple nonlinear layers. With the re expression of residual learning, if the identity map is optimal, the solver drives the weights of multiple nonlinear layers to zero to approximate the identity map. In practice, identity mapping is unlikely to be optimal, but our re expression is helpful for the preprocessing of this problem. If the optimal function is closer to the identity map than to the zero Map, it is much easier for the solver to find the disturbance about the identity map than to learn a new function. Experiments show that the residual function usually has a small response, which shows that identity mapping provides a reasonable preprocessing.

The formula F(x)+x can be realized by the “shortcut connection” of the feedforward neural network. Shortcut connection is to skip one or more layers. In our example, the shortcut connection simply performs identity mapping, and then superimposes their output with the output of the stacked layer. Identical shortcut connection does not increase additional parameters and computational complexity. The complete network can still be trained through end-to-end SGD backpropagation, and can be implemented simply through the public library without modifying the solver.

Show in Figure

Residual network structure diagram before and after modification

The figure

Multi-scale residual structure diagram

Multi-scale fusion: This part uses different convolution kernels, 1×1, 3×3, 5×5. Through different convolution kernel sizes, different levels of information can be extracted, and different scales of information are transmitted to the next layer of network, feature map. The elephants can share and pass on each other. Each part is combined by jump connections to construct a double bypass network. In this way, the information between these bypasses can be shared with each other, so that image features of different scales can be detected. The operation can be defined as:

Among them, w and b represent the weight and bias terms, the superscript represents the number of layers, and the subscript represents the size of the convolution kernel used in the layer. σ(

So the input and output of the first convolutional layer have M feature maps. The input or output of the second convolutional layer has 2M feature maps. All these feature maps are connected and sent to a 1×1 convolutional layer. This layer reduces the number of these feature maps to M, so the input and output of the MSRB have the same number of feature maps. The unique architecture allows multiple MSRBs to be used together.

Local residual learning: inspired by residual blocks. Multi-scale residual blocks introduce residual ideas to i-mprove the expressive ability of the network. The expre-ssion of the local residual is as follows:

Among them, M_{n} and M_{(n-1)} represent the input and output of MSRB respectively. The operation S+M_{(n-1)} is executed by shortcut connection and adding in order of elements. The use of local residual learning greatly red-uces the computational complexity and improves the pe-rformance of the network.

With the increase of depth, the spatial expression ability of the network gradually decreases, while the semantic expression ability gradually increases [

Inspired by the loss function of GAN, the loss function used in this paper is a combination of multi-scale loss functions. The multi-scale loss function can be e-xtracted from the features of different scales, and deblurring from coarse to fine; the adversarial loss function uses the idea of generating against each other to generate a clear image that is closest to the real image [

1)

The MSE standard applies to each level of the pyramid. Therefore, the loss function is defined as follows:

Among them, _{k}_{k}_{k}_{k}_{k}

2)

DISCRIMINATOR NETWORK STRUCTURE DIAGRAM

# | Layer | Weight demension | stride |
---|---|---|---|

1 | conv | 32×32×5×5 | 2 |

2 | conv | 64×32×5×5 | 1 |

3 | conv | 64 × 64 × 5 × 5 | 2 |

4 | conv | 128×64×5×5 | 1 |

5 | conv | 128×128×5×5 | 4 |

6 | conv | 256×128×5×5 | 1 |

7 | conv | 256×256×5×5 | 4 |

8 | conv | 512×256×5×5 | 1 |

9 | conv | 512×512×4×4 | 4 |

10 | fc | 512×1 ×1 ×1 | - |

11 | sigmoid | - |

Without the generation confrontation network, the generated image will have some improvement compared with the original image, but most of the images are fuzzy, the transition of the object edge will be smooth, and the gap is obvious compared with the real image. After joining the generative countermeasure network, the network can further explore the gap between the generated samples and the real samples, and further improve the visual effect of the generated image. In addition, the network also improves the robustness of the algorithm [

All experiments adopt Pytorch deep learning architecture, and process training images before each batch of training. Firstly, the blurred image and the clear image are randomly placed in the same position, and the image is cropped to 256×256 pixels. The fuzzy image after cutting is used as the input of the generator, the discriminator is used as the input of the discriminator, and the clear cutting image is used as the output of the generator [

Neural network training needs a large number of data sets, and the early blurred image is obtained by convolution of fuzzy kernel and clear image. However, the blurred image produced by this simple method is quite different from the real image collected by the camera. Nah et al. Proposed a new image generation method, which uses a high-speed moving camera to capture video, and extracts the connected short exposure frames for averaging, so as to get the blurred image. For example, GOPRO hero 4 black is used to obtain a long exposure blurred image. This method can simulate complex camera jitter and object motion, and the image is closer to the real image. GOPRO dataset is generated by this method. In this experiment, GOPRO dataset is used to train the network. The dataset contains 3214 pairs of blurred and clear images, 2103 pairs of images are selected for training, and 1111 pairs of images are tested.

This article uses the Pytorch deep learning framework. The neural network requires a large number of datasets to train and test the network, shows in Table ^{4}. We use ADAM optimizer and mini batch size 4 for training. The learning rate is adjusted adaptively from 5×10^{-5}. After1.5×10^{5} iterations, the lea-rning rate is reduced to 1/10 of thr previous one. The overall training requires 4.5×10^{5} iterations.

QUANTITATIVE COMPARISON OF FUZZY PERFORMANCE OF GOPRO DATASET

PSNR | SSIM | Runtime | |
---|---|---|---|

26.64 | 0.9142 | 0.93s | |

27.33 | 0.9324 | 0.72s |

The blurred images in the testset are sent to the generating network for processing, and the deblurring images are obtained. No discriminator is needed in this process. Through the analysis of the original author’s data, it can be concluded that PSNR and SSIM have been significantly improved, and the running time has been significantly shortened. In the image shown in the Figure

Comparison of GOPRO dataset test results

The main content of this paper is motion image deblurring. In view of the poor effect of motion deblurring, the idea of end-to-end blind deblurring is proposed. Unblinded deblurring is a method that does not depend on the estimation of fuzzy kernel, and it can restore the image directly by constructing neural network. The network structure of this paper Multi-scale fusion is through the use of different convolution kernel, which can extract image features in multiple directions and process the texture details of the image; the function of local residual should be used to fuse the extracted different feature images, and the second function can reduce the load of neural network. The combination of multi-scale loss function and adversarial loss function constrains the generation effect of clear image, making the final image closer to the real image. The whole network structure of this paper is simple, and it is suitable to deal with image degradation caused by motion blur. The future work will focus on the following aspects:

1) The algorithm is still lacking in the restoration of the details of the blurred image, and it is necessary to further modify the network design to improve the clarity of the image.

2) The multi-scale residual network structure extracts feature maps of different scales through convolution kernels of different sizes, and will be considered for application in the field of restoring blurred videos in the future.