Back to Course
CMSC 178IP

Module 10: Computer Vision & Deep Learning II

1 / --

Computer Vision & Deep Learning II

CMSC 178IP - Module 10

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Computer Vision & Deep Learning II

CMSC 178IP - Module 10

Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu

Learning Objectives

By the end of this module, you will be able to:

  1. Build image classification systems
  2. Understand object detection architectures (YOLO, Faster R-CNN)
  3. Implement semantic segmentation with U-Net
  4. Evaluate models using IoU, mAP, and other metrics
  5. Apply transfer learning for computer vision tasks

Image Classification

Assigning labels to images

MNIST Dataset

MNIST

MNIST: 70,000 handwritten digits (28×28 grayscale). The "Hello World" of deep learning.

CIFAR-10 Dataset

CIFAR-10

CIFAR-10: 60,000 color images (32×32) in 10 classes. More challenging than MNIST.

CNN for Classification

CNN Architecture
Classification Pipeline:
Conv layers → Feature extraction
FC layers → Classification
Softmax → Probability distribution over classes

Feature Maps Visualization

Feature Maps

What the CNN learns at different layers

Training and Evaluation

Training Curves
Monitor: Loss and accuracy on train/val sets. Gap indicates overfitting.

Confusion Matrix

Confusion Matrix

Metrics from confusion matrix:

  • Precision = TP / (TP + FP)
  • Recall = TP / (TP + FN)
  • F1 = 2 × (Precision × Recall) / (Precision + Recall)

Multiclass Predictions

Predictions

Sample predictions showing correct classifications and errors

Object Detection

Finding and localizing objects

Object Detection Task

Detection Example

Object Detection: Not just "what" but "where".

Output: Class label + Bounding box (x, y, width, height)

YOLO Architecture

YOLO Grid

YOLO (You Only Look Once):

  • Single-shot detector - very fast
  • Divides image into grid cells
  • Each cell predicts B boxes + class probabilities
  • Real-time detection possible

Knowledge Check

Think About It

Why is YOLO faster than two-stage detectors like Faster R-CNN?

Click the blurred area to reveal the answer

Faster R-CNN

Faster R-CNN
Two-stage detector:
  1. Region Proposal Network (RPN): Generate candidate boxes
  2. Classification head: Classify and refine boxes

More accurate but slower than YOLO.

IoU (Intersection over Union)

IoU

IoU: Measures overlap between predicted and ground truth boxes.

$$IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}}$$

IoU > 0.5 typically considered a "correct" detection.

Non-Maximum Suppression

NMS

NMS: Remove duplicate detections.

  1. Sort boxes by confidence
  2. Keep highest confidence box
  3. Remove boxes with IoU > threshold
  4. Repeat for remaining boxes

Mean Average Precision

mAP
mAP: Standard metric for object detection.
Average of Average Precision (AP) across all classes.
AP = Area under Precision-Recall curve.

Semantic Segmentation

Pixel-level classification

Segmentation Types

Segmentation Types
Semantic
Classify every pixel
No instance distinction
Instance
Separate each object
Distinguishes individuals

Knowledge Check

Think About It

What is the difference between semantic and instance segmentation?

Click the blurred area to reveal the answer

U-Net Architecture

U-Net

U-Net: Encoder-decoder with skip connections.

  • Encoder: Downsample, extract features
  • Decoder: Upsample, recover spatial detail
  • Skip connections: Preserve fine details

Segmentation Example

Segmentation Example

Input image and pixel-wise segmentation output

Implementation

import torch
import torchvision.models as models

# Transfer learning with pretrained ResNet
model = models.resnet18(pretrained=True)

# Freeze feature extractor
for param in model.parameters():
    param.requires_grad = False

# Replace classifier for new task
model.fc = torch.nn.Linear(512, num_classes)

# Object detection with torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
detector = fasterrcnn_resnet50_fpn(pretrained=True)

Summary

Key Takeaways

Key Takeaways

  1. Classification: Assign single label per image
  2. Object Detection: YOLO (fast), Faster R-CNN (accurate)
  3. IoU & NMS: Essential for evaluating and refining detections
  4. mAP: Standard detection evaluation metric
  5. Segmentation: Pixel-level classification (U-Net)
  6. Transfer learning: Leverage pretrained models

Questions?

Thank you for your attention!


Next: Module 11 - Generative Models

End of Module 10

Computer Vision & Deep Learning II

Questions?