Course: CSCI E-89 (Deep Learning), Harvard Extension School, Fall 2024 Code: github.com/nthapaliya/cnn-image-upscaling


Overview

Single-image super-resolution (SISR) is the task of recovering a plausible high-resolution image from a low-resolution input. This project trains and compares three CNN architectures at 4× upscaling on the FFHQ dataset (70,000 high-quality face images), measuring output quality with PSNR and SSIM.

Faces provide a structured benchmark domain where quality degradation is perceptually obvious and metrics are well-calibrated.

Architectures

ModelKey idea
SRCNNPioneering 3-layer super-resolution CNN (Dong et al., 2014)
ESPCNSub-pixel convolution (pixel shuffle) for efficient upscaling (Shi et al., 2016)
EDSRRemoves batch norm for more stable training at depth (Lim et al., 2017)

Evaluation

Both metrics computed on the luminance channel (Y of YCbCr), matching standard practice:

  • PSNR — Peak Signal-to-Noise Ratio (higher is better, measured in dB)
  • SSIM — Structural Similarity Index (higher is better, 0–1)

Dataset

FFHQ — 70,000 high-quality PNG face images at 1024×1024. Downloaded via Kaggle. Low-resolution training inputs created by bicubic downsampling (4× reduction). 65,000 train / 5,000 test split.

Stack

TensorFlow 2.x, Keras, NumPy, Matplotlib, Kaggle API, uv