Image-Denoising

Background

Image noise is random variation of brightness or color information in images. There can be multiple sources of image noise. Noise can get introduced inherently at a different stage of image capture pipeline from light variation, camera optics, image sensor to image storage.

The Problem/Motivation

One of the fundamental challenges in the field of Image processing and Computer vision is Image denoising, where the goal is to estimate the original image by suppressing noise from the contaminated region in an Image. Image Denoising has numerous applications, such as:

Also Image denoising is useful as a preprocessing step for several computer vision tasks where obtaining the original image content is crucial for strong performance:

This project aims to extract a clean image from the noisy image , with noisy component as , which is explained by

.

Problem Scope

We are limiting the problem scope to tackle additive guassian white noise(AGWN) only and will demonstrate how supervised and unsupervised techniques could be used to denoise images with AGWN.

Metrics

PSNR(Peak Signal-to-Noise Ratio)

PSNR, is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. PSNR is most easily defined via the mean squared error (MSE). Given a noise-free m×n monochrome image I and its noisy approximation K, MSE is defined as:

The PSNR (in dB) is defined as:

SSIM

The difference with respect to other techniques mentioned previously such as MSE or PSNR is that these approaches estimate absolute errors; on the other hand, SSIM[9] is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. Structural information is the idea that the pixels have strong inter-dependencies, especially when they are spatially close. These dependencies carry essential information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is a significant activity or “texture” in the image.

Data

As we have multiple approaches and experiments, we have chosen a common dataset CBSD68 [3] to analyze results. The CBSD68 dataset is a dataset commonly used for benchmarking in Image denoising domain. It contains 68 images and corresponding noisy images at different sigma levels.

Note that as this dataset has quite less no. of samples, for supervised learning approach we have also used other datasets for training. We have explored other datasets for unsupervised approach as well as mentioned below.

Supervised:

Unsupervised

Approach 1 (Supervised)

In this approach, we have used supervised learning to learn the clean image given a noisy image. The function approximator chosen is a neural network comprising of convolutional and residual blocks, as shown in figure below. Two experiments were conducted, one with pure convolutional layers and the other with mix of convolutional and residual block as detailed below.

Experiment 1 (Deep CNNs)

The code is available here and here.

Datasets

Two datasets were used in this experiment PASCAL VOC 2010 [2] and CBSD68. The PASCAl training data contains approximately 10k images. This dataset is split into training, valid and test datasets with ratios 80, 10 and 10 respectively. The CBSD68 is used for testing purpose only, in this experiment.

Architecture

As shown in the figure below, the architecture takes an Input image, and then it is passed through convolutional layers having 64 filters of 9x9 kernel size, 32 filters of 5x5 kernel size, and 1 filter of 5x5 filter size respectively. Relu activations are used in all the layers. Stride used is of size 1, so the output size is reduced by 8 pixels in all directions. To accommodate this, we can either pad the image or give a larger input size. we chose to go with the latter as chosen in [1].

Figure 1. Network architectures used in (left) Experiment 1 and (right) Experiment 2. The graph is generated using the Netron app [5]

Data Augmentation/Pre-processing

The input image size used is of size 33x33 and output size is 17x17. As input images have varied dimensions in PASCAL dataset(or other datasets), during preprocessing, we have cropped the images. Note that crop can be at random part of the image. So, this acts as data augmentation technique as well. The 33x33 input image should have noise as well. The added noise is random from 10-50 sigmas. The corresponding clean mid-portion of this image is target.

Training:

Pytorch [4] is used to write the code, and network is trained in google colab using GPU’s. Training is done batchwise using 128 batches of 33x33 noisy input images and 17x17 corresponding clean target images. MSE loss and Adam optimzer were used with learning rate of 0.001. Using the clean target image of 17x17, the MSE loss is calculated from the networks output image. Training is done for 100 epochs at this configuration. As loss got stagnated here we reduced learning rate to 0.0001 and trained another 50 epochs. After this, we added a residual block to the network and initialized its weights to random weights, with other layers weights unchanged. This network is trained for another 50 epochs with learning rate 0.01. We have stopped training at this point due to longer training periods (50 epochs approximately took 2 hours), even though it been shown in [1], that adding more residual blocks will improve the PSNR scores further. At all stages of training, validation loss is calculated and monitored as well to see if the network is generalizing to unseen data.

Note, to experiment further with residual blocks, experiment 2 is performed, which will be detailed below.

Traning and validation loss graph:

image

What is new in our approach?

During training we add random amount of noise to the images instead of a fixed sigma. This model generalized well to all sigmas from 10-50 during evaluation.

Results and observations:

The average PSNR scores and SSIM scores on the test set of PASCAL, for the best model was given below. Note that best model is 3 layered, as 5 layered one couldn’t be trained completely due to computing constraints. Input crop size of 200 was used to show the results instead of 33. Also, left value in the column indicates average PSNR compared with noisy input, while the right bolded one indicates the average PSNR with the denoised output. Similar case with SSIM.

Sigma PSNR SSIM
10 28.33 to 31.92 0.73 to 0.90
25 20.63 to 28.94 0.44 to 0.83
50 15.13 to 25.66 0.24 to 0.70
50(crop 33) 15.16 to 26.77 0.22 to 0.69

The same model is tested on the CBSD dataset, Average PSNR and SSIM score are as follows,

Sigma PSNR SSIM
10 28.26 to 33.33 0.75 to 0.93
25 20.48 to 29.45 0.45 to 0.85
50 14.97 to 25.67 0.25 to 0.71
50(crop 33) 15.04 to 26.68 0.23 to 0.69

The above results indicate the model is generalising well to other datasets having similar noise as AWGN. Also, the net PSNR achieved is a bit a lower than from the paper’s [1] best, as we are only using 3 layers for training.

Experiment 2 (Deep ResNets)

In this experiment we implement the residual network connections in the convolutional denoising network. Since residual networks are memory intensive, we train the network on a different dataset [DIV2K] which is smaller and test the network on our validation set : [CBSD]. The DIV2K[8] dataset consists of 800 very high resolution images.

Residual networks

It is known that very deep neural networks have very high representational power, but comes very difficult to train compared to shallow networks. This can be attributed the vanishing gradients during backpropagation i.e very little information / learning is happening in the first few layers of the network. This is fixed by creating residual connections between layers. These residual connections allow gradients to flow directly to the earlier layers thus enabling more efficient learning. Essentially the network formed by these residual connections is comparable to a shallow network present within the deeper network. Thus we retain the generalizing power of shallow network.

Dataset

We use a pytorch dataloader for setting up the data pipeline, we extract random 128x128 crops of the images as the input image. We randomly flip it horizontally and vertically as our data augmentation steps. Then we add gaussian noise to the this image and consider that as our noisy image. We return this pair of original and noisy image as the training input to our network. We set the batch size to 8 as it is the maximum allowable size by the constraints of our GPU.

Architecture

We use 8 convolutional layers with a skip connection between every convolutional layer and the output of the layer following it. These skip connections allow us to train a much deeper network. The network learns to output the noise component of the image i.e it learns to separate the noise from the ground truth image. So to obtain the denoised image, we subtract the output of our model from the noisy image.

Implementation and Hyperparameters

Each convolutional layer consists of 64 filters, kernel size of 3, stride of 1 and padding of 1. This combination allows the layer to preserve the size of the input image after the forward pass, allowing us to arbitrarily stack these layers (as needed for resnet architectures). We use RELU activation function after each convolutional layer. We also disable the bias components of the layers, this reduced the amount of artifacts present in the output image after denoising. For optimization we use the ADAM optimizer with learning rate of 0.001 and train the network for 5 epochs. In order to improve convergence, we also use learning rate scheduler to reduce learning rate by factor of 10 if there is no improvement for 3 epochs.

Results and Observations

During evaluation, we apply the network on the whole image as the convolutional operations can be applied on any image size.

We obtain the results as documented in the tables below. We obtain reasonable improvements to PSNR (25.6) and SSIM scores (0.85). We get PSNR results comparable to our other models. We notice that the training / validation loss are very close which implies that there is possiblility of more improvement which can be explored with more compute resources.

What is new in our approach?

Another novelty that we applied is passing the denoised image back into the model for further refinement, we observe that the PSNR values get a slight reduction but the SSIM score improves by about 0.1 (especially with larger noise ranges). This approach is similar to our PCA approach with iterative application.

Limitations

Following are limitations for supervised approaches:

Approach 2 (Unsupervised)

In this approach we used unsupervised learning techniques to solve the problem of image denoising.There are 2 experiments here as follows:

Experiment 3 (Vanilla PCA)

Principal component analysis is an orthogonal transformation that seeks the direction of maximum variance in the data and commonly used in dimensionality reduction of the data. Data with maximum variance contains most of the data needed to present the whole dataset. In image denoising, one has to take care of the compromise between noisy data and preserving the high variance image data detail. We can start by looking into the PCA analysis to see how PCA inherently tries to reduce the noise in an image.

The basic intuition behind denoising the image is that any components with variance much larger than the effect of the noise should be relatively unaffected by the noise. So if you reconstruct the data using just the most significant subset of principal components, you should be preferentially keeping the signal and throwing out the noise. Though this is not an efficient approach(we will look at better approach through modified PCA in the next section), we can examine how a plain vanilla PCA can improve the PSNR(peak signal to noise ration) over an image.

We tried the plain vanilla PCA method in the mnist digit data set, and then in the RGB images. The approach is:

Before PCA transformation the digit dataset looks like this:

Digits data before denoising

After this we add some random Gaussian noise to it, to make pixels more blurr and add some noise to it. After adding random gaussian noise, the digit dataset looks like this:

Adding Random Gaussian noise to the data

Now we try to see the number of components which can capture most of the variance in the data. From the below figure, we can see that first 10 components can capture 80 percent of the variance in the data.

Plotting Component vs variance graph

Next, we try to plot the digit data for our noisy image using the first 10 components, and we can see that it PCA preserves the signals and loses the noise from the data:

Denoised data

Let’s run the same experiment in a RGB image to see if there is an improvement in PSNR after PCA analysis. The method remains the same:

Limitation

We ran the above process for the CBSD68-dataset provided by Berkeley. It contains both noisy and original image with different gaussian noise level.In the below figures there is comparison which is been made to see how the psnr value and how smim values improves after doing PCA decomposition in a noisy image, but the limitation of vanilla PCA is that it is not necessary that it will reduce the noise always, but it always captures the data with higher variance. To make the point consider the result on a original and noisy and its denoised part below:

Original Image Noisy Image-50 Denoised Image-50

You can observe from above that the results are not that good but there is an improvement in the psnr and smim values, because the denoised part tries to capture the pixels with higher variance. That is why most part of the image in denoised is a bit brown as that is the prominent color in our original image as well.

Gaussian Noise level-50 Gaussian Noise level-25
PSNR comparison accross images PSNR comparison accross images
SMIM comparison accross images SMIM comparison accross images

To rerun the experiment, please clone this repository and run PCA.ipynb notebook under notebooks directory.

Experiment 4 (Local Pixel Grouping - Principle Component Analysis)

The code is available here.

Approach

This approach uses principal component analysis (PCA) with local pixel grouping (LPG) to do image denoising. This approach is based on the general observation that energy of a signal will concentrate on a small subset of PCA transformed dataset, while the energy of noise will evenly spread over the whole dataset. Each pixel is vectorised such that local structure information is preserved in it. Local pixel grouping, implemented by block matching method, is done over nearby pixels to select samples that are similar. PCA applied on these samples, will eliminate noise. It is evident with experiments that this approach can be iteratively applied with appropriate adaptive noise parameter tuning to improve the denoising performance.

Details of Featurisation based on [7]:

  1. Given a pixel, a window of K X K is taken centered around that pixel and flattened out to generate a vector.
  2. A larger window of L X L is taken surrounding the previous window and pixels in this L X L windows are vectorised as mentioned in step 1 to generate a set of samples.
  3. From the sample generated, we apply local pixel grouping to select samples that are similar to the central K X K block from step 1.

Following picture illustrates the pixel to be denoised, feature vector and training block.

For this experiment K = 3, L = 7, and 250 blocks of K X K are chosen from L X L windows.

We explored how the variance was distributed across principle components for individual pixels for different noise levels. The below are the results for a typical pixel:

From the above plots it can be observed that variance of a signal will concentrate on a small subset of principle components, while the variance of noise will evenly spread over the whole dataset. So, based on the estimated value of sigma least significant components can be droped and image can still be reconstructed with good quality. Most of the dropped components will contain noise, so the reconstructed image will be a denoised image. But as the sigma value increase energy gets too much distributed among all the components, which makes it difficult to estimate the number of principle components to be retained for image reconstruction.

Following is the 2-stage pipeline using LPG-PCA:

This approach takes noisy image and estimated value of sigma as input.

Results and Observations

This approach seems to work good to do image denoising while retaining stuctural symmetry for decent amount of noise(sigma < 30). But for images with too much noise(sigma > 30), the quality of denoised output image is not good. It is observed that PSNR and SSIM improves around 1~2 and 0.1~0.2 respectively from first iteration to second iteration with this approach.

Limitations

Following are limitations for this approach:

Results comparison across approaches:

Average PSNR on CBSD68 dataset for all experiments:

Sigma Deep CNNs Deep ResNets Vanilla PCA LPG_PCA
10 28.26 to 33.33 28.26 to 30.29 28.26 to 26.46 28.26 to 32.93
25 20.48 to 29.45 20.48 to 25.9 20.48 to 22.93 20.48 to 27.72
50 14.97 to 25.67 14.97 to 22.8 14.97 to 18.60 14.97 to 17.93

Average SSIM on CBSD68 dataset for all experiments:

Sigma Deep CNNs Deep ResNets Vanilla PCA LPG_PCA
10 0.75 to 0.93 0.75 to 0.937 0.75 to 0.9 0.75 to 0.95
25 0.45 to 0.85. 0.45 to 0.85 0.45 to 0.72 0.45 to 0.85
50 0.25 to 0.71 0.25 to 0.78 0.25 to 0.46 0.25 to 0.38

Qualitative Results (from all approaches/experiments):

Results with sigma = 50

Original Nosiy Input with sigma=50
NA PSNR=14.91, SSIM=0.31
Deep CNN denoised output Deep Resnet Denoised output
PSNR = 24.25, SSIM = 0.73 PSNR=21.6, SSIM=0.70
Vanilla PCA Denoised output LPG-PCA Denoised output
PSNR=19.15, SSIM=0.58 PSNR=22.79, SSIM=0.72

Results with sigma = 25

Original Nosiy Input with sigma=25
NA PSNR=20.19, SSIM=0.21
Deep CNN denoised output Deep Resnet Denoised output
PSNR = 32.78, SSIM = 0.84 PSNR=29.07, SSIM=0.86
Vanilla PCA Denoised output LPG-PCA Denoised output
PSNR=19.15, SSIM=0.58 PSNR=29.0, SSIM=0.81

Results with sigma = 10

Original Nosiy Input with sigma=10
NA PSNR=28.12, SSIM=0.61
Deep CNN Denoised output Deep Resnet Denoised output
PSNR = 35.41, SSIM = 0.94 PSNR=32.5, SSIM=0.943
Vanilla PCA Denoised output LPG-PCA Denoised output
PSNR=19.15, SSIM=0.58 PSNR=33.97, SSIM=0.95

Conclusion

In this project we conducted different experiments for supervised and unsupervised machine learning algorithms for image denoising. We began with vanilla PCA to understand how high variance pixels are retained in the image giving us the intuition for removal of noised pixels components. We saw how various techniques like DnResNet, LPG-PCA gives better results with good PSNR values. We also observed that high PSNR or SSIM values need not necessarily mean image looks good aesthetically.

References:

  1. Ren, H., El-Khamy, M., & Lee, J. (2019). DN-ResNet: Efficient Deep Residual Network for Image Denoising. Computer Vision – ACCV 2018 Lecture Notes in Computer Science, 215–230. doi: 10.1007/978-3-030-20873-8_14
  2. pascal-voc-2010. (n.d.). The {PASCAL} {V}Isual {O}Bject {C}Lasses {C}Hallenge 2010 {(VOC2010)} {R}Esults. Retrieved from here
  3. Clausmichele. (n.d.). clausmichele/CBSD68-dataset. Retrieved from here.
  4. pytorch/pytorch. Retrieved from here.
  5. lutzroeder/netron. Retrieved from here
  6. Vanderplas, Jacob T. Python Data Science Handbook: Tools and Techniques for Developers. OReilly, 2016.
  7. Zhang, L., Dong, W., Zhang, D., & Shi, G. (2010). Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recognition, 43(4), 1531–1549. doi: 10.1016/j.patcog.2009.09.023
  8. DIV2K dataset: DIVerse 2K resolution high quality images as used for the challenges
  9. https://en.wikipedia.org/wiki/Structural_similarity

Contributions

All members of the project have contributed equally in discussions, project formulation, Report generation. For individual learning, each member focused on the following: