Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu
Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu
By the end of this module, you will be able to:
Assigning labels to images
MNIST: 70,000 handwritten digits (28×28 grayscale). The "Hello World" of deep learning.
CIFAR-10: 60,000 color images (32×32) in 10 classes. More challenging than MNIST.
What the CNN learns at different layers
Metrics from confusion matrix:
Sample predictions showing correct classifications and errors
Finding and localizing objects
Object Detection: Not just "what" but "where".
Output: Class label + Bounding box (x, y, width, height)
YOLO (You Only Look Once):
Why is YOLO faster than two-stage detectors like Faster R-CNN?
YOLO processes the entire image in one forward pass through the network, directly predicting bounding boxes and classes. Two-stage detectors first generate region proposals, then classify each proposal, requiring multiple network evaluations.
Click the blurred area to reveal the answer
More accurate but slower than YOLO.
IoU: Measures overlap between predicted and ground truth boxes.
$$IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}}$$
IoU > 0.5 typically considered a "correct" detection.
NMS: Remove duplicate detections.
Pixel-level classification
What is the difference between semantic and instance segmentation?
Semantic segmentation labels each pixel with a class but doesn't distinguish between different instances of the same class. Instance segmentation identifies each individual object separately (e.g., person 1, person 2).
Click the blurred area to reveal the answer
U-Net: Encoder-decoder with skip connections.
Input image and pixel-wise segmentation output
import torch
import torchvision.models as models
# Transfer learning with pretrained ResNet
model = models.resnet18(pretrained=True)
# Freeze feature extractor
for param in model.parameters():
param.requires_grad = False
# Replace classifier for new task
model.fc = torch.nn.Linear(512, num_classes)
# Object detection with torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
detector = fasterrcnn_resnet50_fpn(pretrained=True)
Key Takeaways
Thank you for your attention!
Next: Module 11 - Generative Models
Questions?