Instance-Conditional Knowledge Distillation for Object Detection
Authors: Zijian Kang, Peizhen Zhang, Xiangyu Zhang, Jian Sun, Nanning Zheng
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the efficacy of our method: we observe impressive improvements under various settings. We perform comprehensive experiments on challenging benchmarks. Results demonstrate impressive improvements over various detectors with up to 4 AP gain in MS-COCO, including recent detectors for instance segmentation [41, 46, 16]. |
| Researcher Affiliation | Collaboration | Zijian Kang Xi an Jiaotong University kzj123@stu.xjtu.edu.cn; Peizhen Zhang MEGVII Technology zhangpeizhen@megvii.com; Xiangyu Zhang MEGVII Technology zhangxiangyu@megvii.com; Jian Sun MEGVII Technology sunjian@megvii.com; Nanning Zheng Xi an Jiaotong University nnzheng@mail.xjtu.edu.cn |
| Pseudocode | No | The paper includes mathematical formulations and architectural diagrams (e.g., Figure 1, Figure 2), but does not contain structured pseudocode blocks or sections explicitly labeled 'Algorithm'. |
| Open Source Code | Yes | Code has been released on https://github.com/megvii-research/ICD. |
| Open Datasets | Yes | Most experiments are conducted on a large scale object detection benchmark MS-COCO 4[31] with 80 classes. MS-COCO is publicly available, the annotations are licensed under a Creative Commons Attribution 4.0 License and the use of the images follows Flickr Terms of Use. Refer to [31] for more details. |
| Dataset Splits | Yes | We train models on MS-COCO 2017 trainval115k subset and validate on minival subset. |
| Hardware Specification | Yes | All experiments are running on eight 2080ti GPUs with 2 images in each. Specifically, we benchmark on 1 schedule on Retina Net [30] with eight 2080ti, following the same configuration in Section 4.2. |
| Software Dependencies | No | We conduct experiments on Pytorch [34] with the widely used Detectron2 library [47] and Adelai Det library 3 [40]. While these software components are mentioned, specific version numbers for PyTorch, Detectron2, or AdelaiDet are not provided in the text. |
| Experiment Setup | Yes | We adopt the 1 schedule, which denotes 9k iterations of training, following the standard protocols in Detectron2 unless otherwise specified. For distillation, the hyper-parameter λ is set to 8 for one-stage detectors and 3 for two-stage detectors respectively. To optimize the transformer decoder, we adopt Adam W optimizer [33] for the decoder and MLPs following common settings for transformer [43, 5]. Corresponding hyper-parameters follows [5], where the initial learning rate and weight decay are set to 1e-4. We adopt the 256 hidden dimension for our decoder and all MLPs, the decoder has 8 heads in parallel. |