Back to Course
CMSC 178IP

Module 09: Computer Vision & Deep Learning I

1 / --

Computer Vision & Deep Learning I

CMSC 178IP - Module 09

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Computer Vision & Deep Learning I

CMSC 178IP - Module 09

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Learning Objectives

By the end of this module, you will be able to:

  1. Understand neural network fundamentals (perceptron, MLP)
  2. Explain activation functions and their purposes
  3. Describe gradient descent and backpropagation
  4. Understand CNN architecture (convolution, pooling)
  5. Prepare data for deep learning models

Neural Network Basics

From perceptron to deep networks

The Perceptron

Perceptron

Perceptron:

$$y = \sigma\left(\sum_{i=1}^{n} w_i x_i + b\right)$$

Weighted sum of inputs + bias, passed through activation function

Activation Functions

Activation Functions
Sigmoid

$$\sigma(x) = \frac{1}{1+e^{-x}}$$

ReLU

$$f(x) = \max(0, x)$$

Knowledge Check

Think About It

Why is ReLU preferred over sigmoid in deep networks?

Click the blurred area to reveal the answer

Multi-Layer Perceptron

MLP Architecture

MLP: Multiple layers of neurons connected in sequence.

  • Input layer: Receives data
  • Hidden layers: Learn representations
  • Output layer: Produces predictions

Training Neural Networks

Learning from data

Loss Functions

Loss Functions
MSE (Regression)

$$L = \frac{1}{n}\sum(y - \hat{y})^2$$

Cross-Entropy (Classification)

$$L = -\sum y \log(\hat{y})$$

Gradient Descent

Gradient Descent

Parameter Update:

$$w = w - \eta \frac{\partial L}{\partial w}$$

η = learning rate. Move in direction that reduces loss.

Knowledge Check

Think About It

What happens if the learning rate is too large or too small?

Click the blurred area to reveal the answer

Learning Curves

Learning Curves
Monitor: Training loss, validation loss, accuracy over epochs to detect overfitting.

Overfitting

Overfitting

Overfitting: Model memorizes training data but fails on new data.

Solutions: Regularization (L1/L2), dropout, early stopping, data augmentation, more data.

Convolutional Neural Networks

Designed for image data

Convolution Operation

Convolution
Convolution in CNNs: Learnable filters slide across image, producing feature maps. Captures local patterns like edges, textures.

Pooling Operations

Pooling
Max Pooling
Takes maximum value in window
Preserves strongest features
Average Pooling
Takes average value in window
Smoother downsampling

Knowledge Check

Think About It

What are the benefits of pooling layers?

Click the blurred area to reveal the answer

CNN Architecture

CNN Architecture

Typical CNN:

Input → [Conv → ReLU → Pool] × N → Flatten → FC → Output

Early layers: edges, textures. Deeper layers: complex patterns, objects.

Feature Maps

Feature Maps
Visualization: What does the CNN see? Feature maps show learned representations at each layer.

Data Preparation

Getting your data ready

Preprocessing Pipeline

Preprocessing

Essential steps:

  • Resize to consistent dimensions
  • Normalize pixel values (0-1 or -1 to 1)
  • Split: train/validation/test sets
  • Batch loading for memory efficiency

Data Augmentation

Data Augmentation

Augmentation: Artificially expand training set by applying transformations.

  • Rotation, flipping, cropping
  • Brightness, contrast adjustments
  • Scaling, shearing
  • Random erasing, cutout

Classification Example

Classification

End-to-end image classification with a CNN

Confusion Matrix

Confusion Matrix
Evaluation: Confusion matrix shows true positives, false positives, true negatives, false negatives for each class. Useful for understanding model errors.

Implementation

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

Summary

Key Takeaways

Key Takeaways

  1. Perceptron: Basic unit - weighted sum + activation
  2. Activation functions: ReLU (most common), sigmoid, softmax
  3. Training: Minimize loss using gradient descent
  4. CNN: Convolution + pooling for spatial features
  5. Overfitting: Combat with regularization, dropout, augmentation
  6. Data prep: Normalize, augment, split properly

Questions?

Thank you for your attention!


Next: Module 10 - Computer Vision & Deep Learning II

End of Module 09

Computer Vision & Deep Learning I

Questions?