reproducibilityindex.ai

LGD: Label-Guided Self-Distillation for Object Detection

Authors: Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun3309-3317

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, LGD obtains decent results on various detectors, datasets, and extensive tasks like instance segmentation. For example in MSCOCO dataset, LGD improves Retina Net with Res Net-50 under 2 single-scale training from 36.2% to 39.0% m AP (+ 2.8%). It boosts much stronger detectors like FCOS with Res Ne Xt-101 DCN v2 under 2 multi-scale training from 46.1% to 47.9% (+ 1.8%). Compared with a classical teacherbased method FGFI, LGD not only performs better without requiring pretrained teacher but also reduces 51% training cost beyond inherent student learning. Codes are available at https://github.com/megvii-research/LGD.
Researcher Affiliation	Collaboration	Peizhen Zhang,1 Zijian Kang,2 Tong Yang,1 Xiangyu Zhang, 1 Nanning Zheng,2 Jian Sun 1 1MEGVII Technology, 2Xi an Jiaotong University {zhangpeizhen, yangtong, zhangxiangyu, sunjian}@megvii.com, kzj123@stu.xjtu.edu.cn, nnzheng@mail.xjtu.edu.cn
Pseudocode	No	The paper describes its method in detailed text and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are available at https://github.com/megvii-research/LGD.
Open Datasets	Yes	Main experiments are validated on MS-COCO (Lin et al. 2014) dataset that we also testify on others: Pascal VOC (Everingham et al. 2010) and Crowd Human (Shao et al. 2018).
Dataset Splits	Yes	Following common protocol (He, Girshick, and Doll ar 2019), we use the trainval-115k and minival-5k subsets w.r.t. training and evaluation. We denote by 1 the training for 90k iterations where learning rate is divided by 10 at 60k and 80k iterations. By analogy, 2 denotes 180k of iterations with milestones at 120k and 160k.
Hardware Specification	Yes	Experiments are run with batch size 16 on 8 GPUs. ... The examination is run on 8 Tesla V100 GPUs upon Retina Net 2 ss R-50.
Software Dependencies	No	The proposed framework is built upon Detectron2 (Wu et al. 2019). The paper mentions software tools like Detectron2 but does not provide specific version numbers for any key software components or libraries used.
Experiment Setup	Yes	Experiments are run with batch size 16 on 8 GPUs. Inputs are resized such that shorter sides are no more than 800 pixels. We use SGD optimizer with 0.9 momentum and 10 4 weight decay. The multi-head attention in interobject relation adapter uses T = 8 heads following common practice. ...We denote by 1 the training for 90k iterations where learning rate is divided by 10 at 60k and 80k iterations. By analogy, 2 denotes 180k of iterations with milestones at 120k and 160k. ...For stable training, the distillation starts in 30k iterations since it could be detrimental when the instructive knowledge is optimized insufficiently (Hao et al. 2020; Liu et al. 2020). The student detector backbone is frozen in early 10k iterations under 1 training schedule and 20k for 2 training schedule.