reproducibilityindex.ai

OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

Authors: Shengjian Wu, Li Sun, Qingli Li

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.
Researcher Affiliation	Collaboration	Shengjian Wu1,2 , Li Sun1,3 , Qingli Li1 1Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University 2Finvolution Group 3Key Laboratory of Advanced Theory and Application in Statistics and Data Science, East China Normal University sunli@ee.ecnu.edu.cn
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide any concrete statement or link regarding the availability of its source code.
Open Datasets	Yes	We conduct all our experiments on MS-COCO [Lin et al., 2014] 2017 dataset and evaluate the performance of our models on validation dataset by using mean average precision (m AP) metric. The COCO dataset contains 117K training images and 5K validation images.
Dataset Splits	Yes	The COCO dataset contains 117K training images and 5K validation images.
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found. Only a generic mention of “8 GPUs” is present.
Software Dependencies	No	No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment were found. Only hyperparameter values and training schedules are described.
Experiment Setup	Yes	Our experiments are conducted over 12 (1x) and 24 (2x) epochs on 8 GPUs. Learning rate settings for OD-DETR are identical to those of Def-DETR, with a learning rate of 2 10 5 for the backbone and 2 10 4 for the Transformer encoder-decoder framework, coupled with a weight decay of 2 10 5. The learning rates and batch sizes for OD-DAB-DETR and OD-DINO follow their respective baselines. We set the EMA decay value at 0.9996.