Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

YOLOv12: Attention-Centric Real-Time Object Detectors

Authors: Yunjie Tian, Qixiang Ye, DAVID DOERMANN

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on standard object detection benchmarks following YOLO11 [31] without any additional tricks, demonstrating that YOLOv12 provides significant improvements over previous popular models in terms of latency-accuracy and FLOPs-accuracy trade-offs across these scales, as illustrated in Figure 1.
Researcher Affiliation Academia Yunjie Tian University at Buffalo EMAIL Qixiang Ye UCAS EMAIL David Doermann University at Buffalo EMAIL
Pseudocode No The paper describes the proposed methods and architectures using figures and text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://github.com/sunsmarterjie/yolov12.
Open Datasets Yes We validate YOLOv12 on the MSCOCO 2017 dataset [39].
Dataset Splits Yes We validate YOLOv12 on the MSCOCO 2017 dataset [39]... Following established conventions [26, 67, 61, 31], we report the standard mean average precision (m AP) on different object scales and Io U thresholds.
Hardware Specification Yes For example, YOLOv12-N achieves 40.5% m AP with an inference latency of 1.62 ms on a T4 GPU... The latencies of all models are tested on a T4 GPU using Tensor RT FP16... CUDA results are measured on T4 / RTX 3080 GPUs... CPU (Intel Core i7-10700K @ 3.80GHz) speed... The N/S/M models are trained on 4 NVIDIA A6000 GPUs and the L/X models are trained on 8 NVIDIA A800 GPUs.
Software Dependencies No The latencies of all models are tested on a T4 GPU using Tensor RT FP16... Following YOLOv11 [31], we adopt the Albumentations library [8].
Experiment Setup Yes All models are trained for 600 epochs using the SGD optimizer with an initial learning rate of 0.01, consistent with YOLO11 [31]. A linear learning rate decay schedule is adopted, with a linear warm-up for the first three epochs. Table 6: Hyperparameters for training the YOLOv12 family on COCO [39]. Training Configuration Epochs 600 Optimizer SGD Momentum 0.937 Batch size 32 Weight decay 5e-4 Warm-up epochs 3 Warm-up momentum 0.8 Warm-up bias learning rate 0.0 Initial learning rate 1e-2 Final learning rate 1e-4 Learning rate schedule Linear decay Loss Parameters Box loss gain 7.5 Class loss gain 0.5 DFL loss gain 1.5 Augmentation Parameters HSV saturation augmentation 0.7 HSV value augmentation 0.4 HSV hue augmentation 0.015 Translation augmentation 0.1 Scale augmentation 0.5/0.9/0.9/0.9/0.9 Mosaic augmentation 1.0 Mixup augmentation 0.0/0.05/0.15/0.15/0.2 Copy-paste augmentation 0.1/0.15/0.4/0.5/0.6 Close mosaic epochs 10