Back to Course
CMSC 178IP

Module 11: Generative Models

1 / --

Generative Models

CMSC 178IP - Module 11

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Generative Models

CMSC 178IP - Module 11

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Learning Objectives

By the end of this module, you will be able to:

  1. Distinguish between generative and discriminative models
  2. Understand autoencoder and VAE architectures
  3. Explain GAN architecture and training dynamics
  4. Implement image-to-image translation
  5. Apply style transfer techniques

Introduction to Generative Models

Creating new data from learned distributions

Generative vs Discriminative

Generative vs Discriminative
Discriminative
Learn P(y|x)
Classification, regression
Generative
Learn P(x) or P(x|y)
Sample new data

Latent Space Concept

Latent Space

Latent Space: Compressed representation where similar items are close together. Meaningful directions in latent space correspond to semantic attributes.

Knowledge Check

Think About It

Why is the latent space important in generative models?

Click the blurred area to reveal the answer

Autoencoders

Learning compressed representations

Autoencoder Architecture

Autoencoder

Autoencoder: Learn identity function through bottleneck.

  • Encoder: Compress input to latent code
  • Decoder: Reconstruct from latent code
  • Loss: Reconstruction error (MSE)

Variational Autoencoders

Probabilistic latent representations

VAE Architecture

VAE Architecture
VAE: Encode to distribution (μ, σ), not single point.
Enables generation by sampling from latent distribution.

VAE Encoder

VAE Encoder

Encoder outputs parameters of latent distribution:

$$q(z|x) = \mathcal{N}(\mu(x), \sigma(x)^2)$$

VAE Sampling (Reparameterization)

VAE Sampling

Reparameterization Trick:

$$z = \mu + \sigma \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, 1)$$

Enables backpropagation through sampling operation.

Knowledge Check

Think About It

Why is the reparameterization trick necessary in VAEs?

Click the blurred area to reveal the answer

VAE Decoder

VAE Decoder
Decoder: Maps latent code z back to image space. Generation: sample z ~ N(0,1), decode to image.

GANs

Generative Adversarial Networks

GAN Architecture

GAN Architecture

GAN: Two networks in competition.

  • Generator: Creates fake images from noise
  • Discriminator: Distinguishes real from fake

GAN Generator

GAN Generator
Generator Goal: Learn to produce images indistinguishable from real data. Maps random noise z to image space.

GAN Discriminator

GAN Discriminator
Discriminator Goal: Correctly classify images as real or fake. Provides learning signal to generator.

GAN Game Theory

GAN Game Theory

Minimax Game:

$$\min_G \max_D \mathbb{E}[\log D(x)] + \mathbb{E}[\log(1 - D(G(z)))]$$

G minimizes, D maximizes. Nash equilibrium when D can't distinguish.

GAN Training Dynamics

Training Dynamics
Alternating training:
  1. Train D on real + fake (labeled)
  2. Train G to fool D
  3. Repeat

Mode Collapse

Mode Collapse

Mode Collapse: Generator produces limited variety. Only generates samples that easily fool discriminator.

Solutions: Wasserstein loss, spectral normalization, progressive training.

Knowledge Check

Think About It

What is mode collapse and why is it problematic?

Click the blurred area to reveal the answer

Training Tips

Training Tips
  • Use batch normalization
  • Leaky ReLU in discriminator
  • Adam optimizer (low learning rate)
  • Label smoothing
  • Balance G and D training

VAE vs GAN

VAE vs GAN
VAE
Stable training
Blurry outputs
Explicit density
GAN
Harder to train
Sharp outputs
Implicit density

Applications

Real-world generative applications

Generation Examples

Generation Examples

Faces, art, objects generated by modern GANs

Latent Interpolation

Latent Interpolation
Interpolation: Smooth transitions between generated images by interpolating in latent space.

Conditional Generation

Conditional Generation

Conditional GAN (cGAN): Control generation with class labels or other conditions. Generate specific types of images on demand.

Image-to-Image Translation

Image Translation
pix2pix, CycleGAN:
Sketch → Photo
Day → Night
Horse → Zebra
Satellite → Map

Style Transfer

Style Transfer

Neural Style Transfer: Apply artistic style of one image to content of another.

Minimize: Content loss + Style loss (Gram matrices)

Applications Overview

Applications

Generative models: art, entertainment, data augmentation, super-resolution

Implementation

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self, latent_dim=100):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 784),
            nn.Tanh()
        )
    
    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)

# Generate images
z = torch.randn(16, 100)
fake_images = generator(z)

Summary

Key Takeaways

Key Takeaways

  1. Autoencoders: Learn compressed representations via reconstruction
  2. VAE: Probabilistic latent space enables generation
  3. GAN: Generator vs discriminator adversarial training
  4. Mode collapse: Major GAN training challenge
  5. Applications: Image generation, translation, style transfer
  6. Future: Diffusion models, large-scale generation

Course Complete!

Congratulations!

You've completed all modules of CMSC 178IP - Digital Image Processing


Good luck with your final projects!

End of Module 11

Generative Models

Questions?