reproducibilityindex.ai

D3ETR: Decoder Distillation for Detection Transformer

Authors: Xiaokang Chen, Jiahui Chen, Yan Liu, Jiaxiang Tang, Gang Zeng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform the experiments on the COCO 2017 [Lin et al., 2014b] detection dataset, which contains about 118K training (train) images and 5K validation (val) images. ... Our D3ETR obtains consistent gains over different backbones. ... The results are presented in Table 1. All the student detectors obtain significant m AP improvements with the knowledge transferred from teacher detectors. ... In this section, we first compare the proposed decoder distillation method to other CNN-based distillation methods in object detection. Subsequently, we conduct ablation studies to verify each component in our decoder distillation strategies.
Researcher Affiliation	Academia	Xiaokang Chen1 , Jiahui Chen2 , Yan Liu3 , Jiaxiang Tang1 and Gang Zeng1 1National Key Laboratory of General Artificial Intelligence, School of IST, Peking University 2Beihang University 3The Chinese University of Hong Kong
Pseudocode	No	The paper describes the proposed methods (Mix Matcher, D3ETR) in detail but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The code will be released.
Open Datasets	Yes	We perform the experiments on the COCO 2017 [Lin et al., 2014b] detection dataset, which contains about 118K training (train) images and 5K validation (val) images.
Dataset Splits	Yes	We perform the experiments on the COCO 2017 [Lin et al., 2014b] detection dataset, which contains about 118K training (train) images and 5K validation (val) images.
Hardware Specification	No	The paper mentions using backbones like 'Res Net-101-C5' but does not specify any hardware components (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	We follow the training setting of DETR [Carion et al., 2020] and Conditional DETR [Meng et al., 2021] that use Image Net pre-trained backbone from TORCHVISION with Batch Normalisation (BN) layers fixed. The transformer parameters are initialized using the Xavier initialization scheme [Glorot and Bengio, 2010]. We train our models for 12/50 epochs utilizing the Adam W [Loshchilov and Hutter, 2017] optimizer. ... The paper mentions software like TORCHVISION and Adam W optimizer, but does not provide specific version numbers for these or other software libraries.
Experiment Setup	Yes	We train our models for 12/50 epochs utilizing the Adam W [Loshchilov and Hutter, 2017] optimizer. The learning rate is reduced by a factor of 10 after 11/40 epochs, respectively. ... The data augmentation scheme is identical to DETR [Carion et al., 2020]: the input image is resized such that the short side is at least 480 pixels and at most 800 pixels and the long side is at most 1333 pixels. The training image is then randomly cropped with a probability of 0.5 to a random rectangular patch. ... µcls = 20 is the tradeoff coefficient. ℓbox is a combination of ℓ1 loss and GIo U loss [Rezatofighi et al., 2019], with loss weights of 10 and 2, respectively. ... λsa is the loss weight and set to 10, 000 as default. ... λca defaults to 10, 000.