Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement

Authors: Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the COCO dataset (Lin et al., 2014a) demonstrate that D-FINE achieves state-of-the-art performance in real-time object detection, surpassing existing models in accuracy and efficiency. D-FINE-L and D-FINE-X achieve 54.0% and 55.8% AP on the COCO dataset at 124 / 78 FPS on an NVIDIA T4 GPU.
Researcher Affiliation Academia 1University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center EMAIL EMAIL
Pseudocode No The paper describes methods (FDR, GO-LSD) mathematically and textually (e.g., equations 2, 3, 5, 6) and uses figures to illustrate processes, but it does not include a distinct section or figure labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Our code and models: https://github.com/Peterande/D-FINE.
Open Datasets Yes Experimental results on the COCO dataset (Lin et al., 2014a) demonstrate that D-FINE achieves state-of-the-art performance... We further pretrain D-FINE and YOLOv10 on the Objects365 dataset (Shao et et al., 2019), before finetuning them on COCO.
Dataset Splits Yes We use the standard COCO2017 (Lin et al., 2014b) data splitting policy, training on COCO train2017, and evaluating on COCO val2017.
Hardware Specification Yes We measure end-to-end latency using Tensor RT FP16 on an NVIDIA T4 GPU. ... The baseline model achieves an AP of 53.0%, with a training time of 29 minutes per epoch and memory usage of 8552 MB on four NVIDIA RTX 4090 GPUs.
Software Dependencies No The paper mentions 'Tensor RT FP16' but does not provide a specific version number. It also mentions 'Adam W optimizer' but without a version or full software stack details. Therefore, explicit versioned software dependencies are not provided.
Experiment Setup Yes Table 6 summarizes the hyperparameter configurations for the D-FINE models. All variants use HGNet V2 backbones pretrained on Image Net (Cui et al., 2021; Russakovsky et al., 2015) and the Adam W optimizer. D-FINE-X is set with an embedding dimension of 384 and a feedforward dimension of 2048, while the other models use 256 and 1024, respectively. The D-FINE-X and D-FINE-L have 6 decoder layers... The base learning rate and weight decay for D-FINE-X and D-FINE-L are 2.5e-4 and 1.25e-4, respectively... The total batch size is 32 across all variants. Training schedules include 72 epochs with advanced augmentation... followed by 2 epochs without advanced augmentation for D-FINE-X and D-FINE-L...