OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

Authors: Shengjian Wu, Li Sun, Qingli Li

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.
Researcher Affiliation Collaboration Shengjian Wu1,2 , Li Sun1,3 , Qingli Li1 1Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University 2Finvolution Group 3Key Laboratory of Advanced Theory and Application in Statistics and Data Science, East China Normal University sunli@ee.ecnu.edu.cn
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any concrete statement or link regarding the availability of its source code.
Open Datasets Yes We conduct all our experiments on MS-COCO [Lin et al., 2014] 2017 dataset and evaluate the performance of our models on validation dataset by using mean average precision (m AP) metric. The COCO dataset contains 117K training images and 5K validation images.
Dataset Splits Yes The COCO dataset contains 117K training images and 5K validation images.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found. Only a generic mention of “8 GPUs” is present.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment were found. Only hyperparameter values and training schedules are described.
Experiment Setup Yes Our experiments are conducted over 12 (1x) and 24 (2x) epochs on 8 GPUs. Learning rate settings for OD-DETR are identical to those of Def-DETR, with a learning rate of 2 10 5 for the backbone and 2 10 4 for the Transformer encoder-decoder framework, coupled with a weight decay of 2 10 5. The learning rates and batch sizes for OD-DAB-DETR and OD-DINO follow their respective baselines. We set the EMA decay value at 0.9996.