Skip to main content

YOLO Series

You Only Look Once — Real-time object detection architecture evolution and study notes.

YOLO is a family of single-stage object detection models that process images in a single forward pass, making them extremely fast compared to two-stage detectors.

Evolution Timeline

VersionYearKey InnovationStudy Note
YOLOv12016First single-stage detector📄 Note
YOLOv22017Batch normalization, anchor boxes📄 Note
YOLOv32018Feature pyramid networks📄 Note
YOLOv42020CSPDarknet, mosaic augmentation📄 Note
YOLOv52020PyTorch implementation📄 Note
YOLOv62022Industrial-focused optimizations📄 Note
YOLOv72022E-ELAN architecture📄 Note
YOLOv82023Unified framework📄 Note
YOLOv92024Programmable Gradient Information📄 Note
Foundation

Before diving into YOLO, it is recommended to review the CNN Fundamentals to understand basic convolutional layers, pooling, and activation functions.

Key Concepts

Single-Stage vs. Two-Stage

Unlike R-CNN series which first propose regions and then classify them (two-stage), YOLO frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.

Grid Cell Strategy

YOLO divides the input image into an S×SS \times S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.

Reference Papers