Generative Models

CMSC 178IP - Module 11

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Generative Models

CMSC 178IP - Module 11

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Learning Objectives

By the end of this module, you will be able to:

Distinguish between generative and discriminative models
Understand autoencoder and VAE architectures
Explain GAN architecture and training dynamics
Implement image-to-image translation
Apply style transfer techniques

Introduction to Generative Models

Creating new data from learned distributions

Generative vs Discriminative

Discriminative
Learn P(y|x)
Classification, regression

Generative
Learn P(x) or P(x|y)
Sample new data

Latent Space Concept

Latent Space: Compressed representation where similar items are close together. Meaningful directions in latent space correspond to semantic attributes.

Knowledge Check

Think About It

Why is the latent space important in generative models?

Click the blurred area to reveal the answer

Autoencoders

Learning compressed representations

Autoencoder Architecture

Autoencoder: Learn identity function through bottleneck.

Encoder: Compress input to latent code
Decoder: Reconstruct from latent code
Loss: Reconstruction error (MSE)

Variational Autoencoders

Probabilistic latent representations

VAE Architecture

VAE: Encode to distribution (μ, σ), not single point.
Enables generation by sampling from latent distribution.

VAE Encoder

Encoder outputs parameters of latent distribution:

q(z|x) = \mathcal{N}(\mu(x), \sigma(x)^2)

VAE Sampling (Reparameterization)

Reparameterization Trick:

z = \mu + \sigma \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, 1)

Enables backpropagation through sampling operation.

Knowledge Check

Think About It

Why is the reparameterization trick necessary in VAEs?

Click the blurred area to reveal the answer

VAE Decoder

Decoder: Maps latent code z back to image space. Generation: sample z ~ N(0,1), decode to image.

GANs

Generative Adversarial Networks

GAN Architecture

GAN: Two networks in competition.

Generator: Creates fake images from noise
Discriminator: Distinguishes real from fake

GAN Generator

Generator Goal: Learn to produce images indistinguishable from real data. Maps random noise z to image space.

GAN Discriminator

Discriminator Goal: Correctly classify images as real or fake. Provides learning signal to generator.

GAN Game Theory

Minimax Game:

\min_G \max_D \mathbb{E}[\log D(x)] + \mathbb{E}[\log(1 - D(G(z)))]

G minimizes, D maximizes. Nash equilibrium when D can't distinguish.

GAN Training Dynamics

Alternating training:

Train D on real + fake (labeled)
Train G to fool D
Repeat

Mode Collapse

Mode Collapse: Generator produces limited variety. Only generates samples that easily fool discriminator.

Solutions: Wasserstein loss, spectral normalization, progressive training.

Knowledge Check

Think About It

What is mode collapse and why is it problematic?

Click the blurred area to reveal the answer

Training Tips

Use batch normalization
Leaky ReLU in discriminator
Adam optimizer (low learning rate)
Label smoothing
Balance G and D training

VAE vs GAN

VAE
Stable training
Blurry outputs
Explicit density

GAN
Harder to train
Sharp outputs
Implicit density

Applications

Real-world generative applications

Generation Examples

Faces, art, objects generated by modern GANs

Latent Interpolation

Interpolation: Smooth transitions between generated images by interpolating in latent space.

Conditional Generation

Conditional GAN (cGAN): Control generation with class labels or other conditions. Generate specific types of images on demand.

Image-to-Image Translation

pix2pix, CycleGAN:
Sketch → Photo
Day → Night
Horse → Zebra
Satellite → Map

Style Transfer

Neural Style Transfer: Apply artistic style of one image to content of another.

Minimize: Content loss + Style loss (Gram matrices)

Applications Overview

Generative models: art, entertainment, data augmentation, super-resolution

Implementation

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self, latent_dim=100):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 784),
            nn.Tanh()
        )
    
    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)

# Generate images
z = torch.randn(16, 100)
fake_images = generator(z)

Summary

Key Takeaways

Autoencoders: Learn compressed representations via reconstruction
VAE: Probabilistic latent space enables generation
GAN: Generator vs discriminator adversarial training
Mode collapse: Major GAN training challenge
Applications: Image generation, translation, style transfer
Future: Diffusion models, large-scale generation

Course Complete!

Congratulations!

You've completed all modules of CMSC 178IP - Digital Image Processing

Good luck with your final projects!

End of Module 11

Generative Models

Questions?