DETR (Detection Transformer)
A transformer-based model for object detection that predicts bounding boxes as set prediction without anchor boxes.
DETR brought transformers to object detection – end-to-end without anchor boxes or NMS, using set prediction via bipartite matching.
Explanation
DETR drastically simplifies the object detection pipeline: no anchor boxes, no NMS (Non-Maximum Suppression). Instead it uses bipartite matching and transformer decoder.
Marketing Relevance
DETR demonstrates that transformers can deliver end-to-end solutions in vision too – foundation for subsequent models like DINO, DAB-DETR, and RT-DETR.
Example
RT-DETR (Real-Time DETR) is used for real-time object detection in autonomous systems, with transformer accuracy at YOLO speed.
Common Pitfalls
Slow training convergence. Weaknesses with small objects. Higher compute requirements than YOLO.
Origin & History
Facebook AI Research released DETR in May 2020. It was the first successful transformer model for object detection. Deformable DETR (2021) solved convergence issues. RT-DETR (2023, Baidu) achieved real-time capability.
Comparisons & Differences
DETR (Detection Transformer) vs. YOLO
YOLO is CNN-based and extremely fast. DETR is transformer-based, more accurate on complex scenes but slower.
DETR (Detection Transformer) vs. Faster R-CNN
Faster R-CNN uses region proposals + NMS. DETR eliminates both through set prediction with Hungarian matching.