Instance-Conditional Knowledge Distillation for Object Detection

Authors: Zijian Kang, Peizhen Zhang, Xiangyu Zhang, Jian Sun, Nanning Zheng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the efficacy of our method: we observe impressive improvements under various settings. We perform comprehensive experiments on challenging benchmarks. Results demonstrate impressive improvements over various detectors with up to 4 AP gain in MS-COCO, including recent detectors for instance segmentation [41, 46, 16].
Researcher Affiliation Collaboration Zijian Kang Xi an Jiaotong University kzj123@stu.xjtu.edu.cn; Peizhen Zhang MEGVII Technology zhangpeizhen@megvii.com; Xiangyu Zhang MEGVII Technology zhangxiangyu@megvii.com; Jian Sun MEGVII Technology sunjian@megvii.com; Nanning Zheng Xi an Jiaotong University nnzheng@mail.xjtu.edu.cn
Pseudocode No The paper includes mathematical formulations and architectural diagrams (e.g., Figure 1, Figure 2), but does not contain structured pseudocode blocks or sections explicitly labeled 'Algorithm'.
Open Source Code Yes Code has been released on https://github.com/megvii-research/ICD.
Open Datasets Yes Most experiments are conducted on a large scale object detection benchmark MS-COCO 4[31] with 80 classes. MS-COCO is publicly available, the annotations are licensed under a Creative Commons Attribution 4.0 License and the use of the images follows Flickr Terms of Use. Refer to [31] for more details.
Dataset Splits Yes We train models on MS-COCO 2017 trainval115k subset and validate on minival subset.
Hardware Specification Yes All experiments are running on eight 2080ti GPUs with 2 images in each. Specifically, we benchmark on 1 schedule on Retina Net [30] with eight 2080ti, following the same configuration in Section 4.2.
Software Dependencies No We conduct experiments on Pytorch [34] with the widely used Detectron2 library [47] and Adelai Det library 3 [40]. While these software components are mentioned, specific version numbers for PyTorch, Detectron2, or AdelaiDet are not provided in the text.
Experiment Setup Yes We adopt the 1 schedule, which denotes 9k iterations of training, following the standard protocols in Detectron2 unless otherwise specified. For distillation, the hyper-parameter λ is set to 8 for one-stage detectors and 3 for two-stage detectors respectively. To optimize the transformer decoder, we adopt Adam W optimizer [33] for the decoder and MLPs following common settings for transformer [43, 5]. Corresponding hyper-parameters follows [5], where the initial learning rate and weight decay are set to 1e-4. We adopt the 256 hidden dimension for our decoder and all MLPs, the decoder has 8 heads in parallel.