Computer Vision & Deep Learning I

CMSC 178IP - Module 09

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Computer Vision & Deep Learning I

CMSC 178IP - Module 09

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Learning Objectives

By the end of this module, you will be able to:

Understand neural network fundamentals (perceptron, MLP)
Explain activation functions and their purposes
Describe gradient descent and backpropagation
Understand CNN architecture (convolution, pooling)
Prepare data for deep learning models

Neural Network Basics

From perceptron to deep networks

The Perceptron

Perceptron:

y = \sigma\left(\sum_{i=1}^{n} w_i x_i + b\right)

Weighted sum of inputs + bias, passed through activation function

Activation Functions

Sigmoid

\sigma(x) = \frac{1}{1+e^{-x}}

ReLU

f(x) = \max(0, x)

Knowledge Check

Think About It

Why is ReLU preferred over sigmoid in deep networks?

Click the blurred area to reveal the answer

Multi-Layer Perceptron

MLP: Multiple layers of neurons connected in sequence.

Input layer: Receives data
Hidden layers: Learn representations
Output layer: Produces predictions

Training Neural Networks

Learning from data

Loss Functions

MSE (Regression)

L = \frac{1}{n}\sum(y - \hat{y})^2

Cross-Entropy (Classification)

L = -\sum y \log(\hat{y})

Gradient Descent

Parameter Update:

w = w - \eta \frac{\partial L}{\partial w}

η = learning rate. Move in direction that reduces loss.

Knowledge Check

Think About It

What happens if the learning rate is too large or too small?

Click the blurred area to reveal the answer

Learning Curves

Monitor: Training loss, validation loss, accuracy over epochs to detect overfitting.

Overfitting

Overfitting: Model memorizes training data but fails on new data.

Solutions: Regularization (L1/L2), dropout, early stopping, data augmentation, more data.

Convolutional Neural Networks

Designed for image data

Convolution Operation

Convolution in CNNs: Learnable filters slide across image, producing feature maps. Captures local patterns like edges, textures.

Pooling Operations

Max Pooling
Takes maximum value in window
Preserves strongest features

Average Pooling
Takes average value in window
Smoother downsampling

Knowledge Check

Think About It

What are the benefits of pooling layers?

Click the blurred area to reveal the answer

CNN Architecture

Typical CNN:

Input → [Conv → ReLU → Pool] × N → Flatten → FC → Output

Early layers: edges, textures. Deeper layers: complex patterns, objects.

Feature Maps

Visualization: What does the CNN see? Feature maps show learned representations at each layer.

Data Preparation

Getting your data ready

Preprocessing Pipeline

Essential steps:

Resize to consistent dimensions
Normalize pixel values (0-1 or -1 to 1)
Split: train/validation/test sets
Batch loading for memory efficiency

Data Augmentation

Augmentation: Artificially expand training set by applying transformations.

Rotation, flipping, cropping
Brightness, contrast adjustments
Scaling, shearing
Random erasing, cutout

Classification Example

End-to-end image classification with a CNN

Confusion Matrix

Evaluation: Confusion matrix shows true positives, false positives, true negatives, false negatives for each class. Useful for understanding model errors.

Implementation

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

Summary

Key Takeaways

Perceptron: Basic unit - weighted sum + activation
Activation functions: ReLU (most common), sigmoid, softmax
Training: Minimize loss using gradient descent
CNN: Convolution + pooling for spatial features
Overfitting: Combat with regularization, dropout, augmentation
Data prep: Normalize, augment, split properly

Questions?

Thank you for your attention!

Next: Module 10 - Computer Vision & Deep Learning II

End of Module 09

Computer Vision & Deep Learning I

Questions?