We leverage a GAN-based architecture to tackle the issue of constructing high-resolution images from low-resolution images while preserving image structure and simultaneously improving image fidelity and sharpness.
- Santosh C
- Prasanna Kumar
- Aaryan Patil
- Arnav Santosh
- SRGAN trained on an image dataset consisting of more than 6000 high resolution and low resolution pairs of training images alongside 273 corresponding pairs of validation images (publicly available here.)
- Performs data augmentation to the training images
- Includes dataset loading, model architecture and the training
- Includes an easy to use upscale function to easily upscale any image given its path
Classical image enhancement techniques such as nearest-neighbor, bilinear and bicubic interpolation produce poor results for upscaling given images, often leading to blurred edges and low image contrast. So in recent years, through the general development of generative models, artificial intelligence has applied itself naturally to the task of image super-resolution.
Generative Adversarial Networks rose to popularity around five years after their introduction in a research paper published in 2014, eventually seeing a rise in popularity in 2019. In recent years, the trend has begun to shift into using diffusion-based models which offer more stability and higher quality outputs in classical image processing tasks, primarily image upscaling.
Nevertheless, this project looks to explore and apply known deep learning techniques that have been known to achieve superior visual fidelity and sharpness in upscaled images and videos using state-of-the-art algorithms such as Generative Adversarial Networks and other deep learning models (such as RNNs) for the task of Super Resolution.
To perform SISO-SR (Single-Image-Single-Output Super Resolution), we use a GAN with a generator and discriminator architecture as shown below.
Fig 1. Generator
Fig 2. Discriminator
The main block of the generator lies in the list of residual blocks connected sequentially. Upscaling is done via Pixel Shuffling layers. Bicubically upscaling the original input image as a baseline for the output allows for color-correct output images. Optional post-processing such as sharpening can be done.
Fig 3. Website developed for the model using Flask.
Fig 4. Example output image. Note the high SSIM value.
The SRGAN model was tested using a completely new dataset consisting of about 76 images, of various objects, animals, places and people. The model was then fed the test dataset and the evaluation metrics (PSNR and SSIM) were calculated and printed between the upscaled image and the original high resolution image.
- Average PSNR value: 23.048
- Average SSIM value: 0.9438
SRGAN was successfully implemented with results surpassing that of the original research paper on which the project was based owing to custom modifications made. With a larger dataset and more training time, the model can be deployed for use in commercial image editing software.
- Numpy
- PyTorch
- Matplotlib
- Pillow
- Flask
- HTML
- CSS