Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (DETR)

    DETR (Detection Transformer)

    Also known as:
    Detection Transformer
    DETR
    End-to-End Object Detection with Transformers
    Updated: 2/10/2026

    A transformer-based model for object detection that predicts bounding boxes as set prediction without anchor boxes.

    Quick Summary

    DETR brought transformers to object detection – end-to-end without anchor boxes or NMS, using set prediction via bipartite matching.

    Explanation

    DETR drastically simplifies the object detection pipeline: no anchor boxes, no NMS (Non-Maximum Suppression). Instead it uses bipartite matching and transformer decoder.

    Marketing Relevance

    DETR demonstrates that transformers can deliver end-to-end solutions in vision too – foundation for subsequent models like DINO, DAB-DETR, and RT-DETR.

    Example

    RT-DETR (Real-Time DETR) is used for real-time object detection in autonomous systems, with transformer accuracy at YOLO speed.

    Common Pitfalls

    Slow training convergence. Weaknesses with small objects. Higher compute requirements than YOLO.

    Origin & History

    Facebook AI Research released DETR in May 2020. It was the first successful transformer model for object detection. Deformable DETR (2021) solved convergence issues. RT-DETR (2023, Baidu) achieved real-time capability.

    Comparisons & Differences

    DETR (Detection Transformer) vs. YOLO

    YOLO is CNN-based and extremely fast. DETR is transformer-based, more accurate on complex scenes but slower.

    DETR (Detection Transformer) vs. Faster R-CNN

    Faster R-CNN uses region proposals + NMS. DETR eliminates both through set prediction with Hungarian matching.

    Related Services

    Related Terms

    👋Questions? Chat with us!