LGD: Label-Guided Self-Distillation for Object Detection
Authors: Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun3309-3317
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, LGD obtains decent results on various detectors, datasets, and extensive tasks like instance segmentation. For example in MSCOCO dataset, LGD improves Retina Net with Res Net-50 under 2 single-scale training from 36.2% to 39.0% m AP (+ 2.8%). It boosts much stronger detectors like FCOS with Res Ne Xt-101 DCN v2 under 2 multi-scale training from 46.1% to 47.9% (+ 1.8%). Compared with a classical teacherbased method FGFI, LGD not only performs better without requiring pretrained teacher but also reduces 51% training cost beyond inherent student learning. Codes are available at https://github.com/megvii-research/LGD. |
| Researcher Affiliation | Collaboration | Peizhen Zhang,*1 Zijian Kang,*2 Tong Yang,1 Xiangyu Zhang, 1 Nanning Zheng,2 Jian Sun 1 1MEGVII Technology, 2Xi an Jiaotong University {zhangpeizhen, yangtong, zhangxiangyu, sunjian}@megvii.com, kzj123@stu.xjtu.edu.cn, nnzheng@mail.xjtu.edu.cn |
| Pseudocode | No | The paper describes its method in detailed text and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/megvii-research/LGD. |
| Open Datasets | Yes | Main experiments are validated on MS-COCO (Lin et al. 2014) dataset that we also testify on others: Pascal VOC (Everingham et al. 2010) and Crowd Human (Shao et al. 2018). |
| Dataset Splits | Yes | Following common protocol (He, Girshick, and Doll ar 2019), we use the trainval-115k and minival-5k subsets w.r.t. training and evaluation. We denote by 1 the training for 90k iterations where learning rate is divided by 10 at 60k and 80k iterations. By analogy, 2 denotes 180k of iterations with milestones at 120k and 160k. |
| Hardware Specification | Yes | Experiments are run with batch size 16 on 8 GPUs. ... The examination is run on 8 Tesla V100 GPUs upon Retina Net 2 ss R-50. |
| Software Dependencies | No | The proposed framework is built upon Detectron2 (Wu et al. 2019). The paper mentions software tools like Detectron2 but does not provide specific version numbers for any key software components or libraries used. |
| Experiment Setup | Yes | Experiments are run with batch size 16 on 8 GPUs. Inputs are resized such that shorter sides are no more than 800 pixels. We use SGD optimizer with 0.9 momentum and 10 4 weight decay. The multi-head attention in interobject relation adapter uses T = 8 heads following common practice. ...We denote by 1 the training for 90k iterations where learning rate is divided by 10 at 60k and 80k iterations. By analogy, 2 denotes 180k of iterations with milestones at 120k and 160k. ...For stable training, the distillation starts in 30k iterations since it could be detrimental when the instructive knowledge is optimized insufficiently (Hao et al. 2020; Liu et al. 2020). The student detector backbone is frozen in early 10k iterations under 1 training schedule and 20k for 2 training schedule. |