1. Create a CNN that maps from image to latent vector z. 2. Train it by minimizing the perceptual loss between input and output by StlyeGAN (unclear: perceptual loss)