In this project, we use Conditional GAN architecture for translating segmented images to real images in street view of Cityscapes dataset.
Here is some examples from validation set.
The loss of discriminator/generator for train/val for 200 epochs.
For more details see the notebook.
The discriminator consist of a sequence of Convolution-BatchNorm-ReLU blocks. The generated-real recognition is done by classifying for patches, then the final output of discriminator is the average of these results. This discriminator architecture is also called PatchGAN.
The discriminator loss is devided by 2 to slow down the rate of learning for D.
In this framework, the generator has U-Net architecture. Note that all convolution blocks have kernel size 4 and there is no pooling layer. Using random noise for z is not effective in this case, so we use dropout with high probability 0.5 to add diversity to our generator. So at inference time, we do not turn off dropout.
The effect of L-1 loss for generator is controlled by hyper-parameter lambda. Also it is usual to optimize -log(D) instead of log(1-D), because of gradient reasons.
- Official PyTorch Pix2Pix Implementation: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/
- Original Paper of Pix2Pix: https://arxiv.org/pdf/1611.07004
- Kaggle Notebook: https://www.kaggle.com/code/mohammadshafizd/pix2pix-conditional-gan-in-cityscapes
- Medium Post: https://medium.com/@mohammadshafizd/pix2pix-in-cityscapes-dataset-e4d743b595b6