YOLO Series
You Only Look Once — Real-time object detection architecture evolution and study notes.
YOLO is a family of single-stage object detection models that process images in a single forward pass, making them extremely fast compared to two-stage detectors.
Evolution Timeline
| Version | Year | Key Innovation | Study Note |
|---|---|---|---|
| YOLOv1 | 2016 | First single-stage detector | 📄 Note |
| YOLOv2 | 2017 | Batch normalization, anchor boxes | 📄 Note |
| YOLOv3 | 2018 | Feature pyramid networks | 📄 Note |
| YOLOv4 | 2020 | CSPDarknet, mosaic augmentation | 📄 Note |
| YOLOv5 | 2020 | PyTorch implementation | 📄 Note |
| YOLOv6 | 2022 | Industrial-focused optimizations | 📄 Note |
| YOLOv7 | 2022 | E-ELAN architecture | 📄 Note |
| YOLOv8 | 2023 | Unified framework | 📄 Note |
| YOLOv9 | 2024 | Programmable Gradient Information | 📄 Note |
Before diving into YOLO, it is recommended to review the CNN Fundamentals to understand basic convolutional layers, pooling, and activation functions.
Key Concepts
Single-Stage vs. Two-Stage
Unlike R-CNN series which first propose regions and then classify them (two-stage), YOLO frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.
Grid Cell Strategy
YOLO divides the input image into an grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.