Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Authors: Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the COCO dataset (Lin et al., 2014a) demonstrate that D-FINE achieves state-of-the-art performance in real-time object detection, surpassing existing models in accuracy and efficiency. D-FINE-L and D-FINE-X achieve 54.0% and 55.8% AP on the COCO dataset at 124 / 78 FPS on an NVIDIA T4 GPU. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center EMAIL EMAIL |
| Pseudocode | No | The paper describes methods (FDR, GO-LSD) mathematically and textually (e.g., equations 2, 3, 5, 6) and uses figures to illustrate processes, but it does not include a distinct section or figure labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Our code and models: https://github.com/Peterande/D-FINE. |
| Open Datasets | Yes | Experimental results on the COCO dataset (Lin et al., 2014a) demonstrate that D-FINE achieves state-of-the-art performance... We further pretrain D-FINE and YOLOv10 on the Objects365 dataset (Shao et et al., 2019), before finetuning them on COCO. |
| Dataset Splits | Yes | We use the standard COCO2017 (Lin et al., 2014b) data splitting policy, training on COCO train2017, and evaluating on COCO val2017. |
| Hardware Specification | Yes | We measure end-to-end latency using Tensor RT FP16 on an NVIDIA T4 GPU. ... The baseline model achieves an AP of 53.0%, with a training time of 29 minutes per epoch and memory usage of 8552 MB on four NVIDIA RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions 'Tensor RT FP16' but does not provide a specific version number. It also mentions 'Adam W optimizer' but without a version or full software stack details. Therefore, explicit versioned software dependencies are not provided. |
| Experiment Setup | Yes | Table 6 summarizes the hyperparameter configurations for the D-FINE models. All variants use HGNet V2 backbones pretrained on Image Net (Cui et al., 2021; Russakovsky et al., 2015) and the Adam W optimizer. D-FINE-X is set with an embedding dimension of 384 and a feedforward dimension of 2048, while the other models use 256 and 1024, respectively. The D-FINE-X and D-FINE-L have 6 decoder layers... The base learning rate and weight decay for D-FINE-X and D-FINE-L are 2.5e-4 and 1.25e-4, respectively... The total batch size is 32 across all variants. Training schedules include 72 epochs with advanced augmentation... followed by 2 epochs without advanced augmentation for D-FINE-X and D-FINE-L... |