Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu
Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu
By the end of this module, you will be able to:
Creating new data from learned distributions
Latent Space: Compressed representation where similar items are close together. Meaningful directions in latent space correspond to semantic attributes.
Why is the latent space important in generative models?
The latent space provides a compressed, structured representation of data. Points in latent space can be sampled to generate new data, and interpolating between points creates smooth transitions. Well-organized latent spaces also allow for semantic manipulation.
Click the blurred area to reveal the answer
Learning compressed representations
Autoencoder: Learn identity function through bottleneck.
Probabilistic latent representations
Encoder outputs parameters of latent distribution:
$$q(z|x) = \mathcal{N}(\mu(x), \sigma(x)^2)$$
Reparameterization Trick:
$$z = \mu + \sigma \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, 1)$$
Enables backpropagation through sampling operation.
Why is the reparameterization trick necessary in VAEs?
Sampling from a distribution is not differentiable. The reparameterization trick moves the stochasticity to an input (ε) so the model can backpropagate gradients through μ and σ, enabling end-to-end training.
Click the blurred area to reveal the answer
Generative Adversarial Networks
GAN: Two networks in competition.
Minimax Game:
$$\min_G \max_D \mathbb{E}[\log D(x)] + \mathbb{E}[\log(1 - D(G(z)))]$$
G minimizes, D maximizes. Nash equilibrium when D can't distinguish.
Mode Collapse: Generator produces limited variety. Only generates samples that easily fool discriminator.
Solutions: Wasserstein loss, spectral normalization, progressive training.
What is mode collapse and why is it problematic?
Mode collapse occurs when the generator learns to produce only a few types of outputs that fool the discriminator, ignoring the full diversity of the real data distribution. The generated samples lack variety and don't represent the true data distribution.
Click the blurred area to reveal the answer
Real-world generative applications
Faces, art, objects generated by modern GANs
Conditional GAN (cGAN): Control generation with class labels or other conditions. Generate specific types of images on demand.
Neural Style Transfer: Apply artistic style of one image to content of another.
Minimize: Content loss + Style loss (Gram matrices)
Generative models: art, entertainment, data augmentation, super-resolution
import torch
import torch.nn as nn
class Generator(nn.Module):
def __init__(self, latent_dim=100):
super().__init__()
self.model = nn.Sequential(
nn.Linear(latent_dim, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 784),
nn.Tanh()
)
def forward(self, z):
return self.model(z).view(-1, 1, 28, 28)
# Generate images
z = torch.randn(16, 100)
fake_images = generator(z)
Key Takeaways
Congratulations!
You've completed all modules of CMSC 178IP - Digital Image Processing
Good luck with your final projects!
Questions?