reproducibilityindex.ai

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

Authors: Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yu-Xiong Wang, Liangyan Gui

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform comprehensive empirical evaluation on the challenging MS COCO dataset and observe consistent gains, regardless of the distillation loss complexity (from a simple feature-matching loss in Table 3 to the most advanced, sophisticated losses in Figure 4). MTPD learns lightweight Retina Net and Mask R-CNN with state-of-the-art accuracy, even in heterogeneous backbone and input resolution settings. Perhaps most impressively, for the ﬁrst time, we investigate heterogeneous distillation from Transformer-based teacher detectors to a convolution-based student, and ﬁnd progressive distilla-tion is the key to bridge their gap (Figure 1, Table 5).
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign 2Carnegie Mellon University 3Now at Waymo 4Georgia Institute of Technology. Correspondence to: Shengcao Cao <cao44@illinois.edu>.
Pseudocode	Yes	We design a heuristic algorithm, Backward Greedy Selection (BGS), to acquire a near-optimal distillation order O automatically (see pseudo-code in Algorithm 1 and illustration in Figure 3).
Open Source Code	Yes	Code available at https: //github.com/Shengcao-Cao/MTPD.
Open Datasets	Yes	We mainly evaluate on the challenging object detection dataset MS COCO 2017 (Lin et al., 2014), which contains bounding boxes and segmentation masks for 80 common object categories. We train our models on the split of train2017 (118k images) and report results on val2017 (5k images). We also evaluate on another object detection dataset Argoverse-HD (Chang et al., 2019), and a more challenging evaluation protocol in streaming perception (Li et al., 2020a).
Dataset Splits	Yes	We train our models on the split of train2017 (118k images) and report results on val2017 (5k images).
Hardware Specification	Yes	The second column denotes the optimal input resolution (that maximizes streaming accuracy). First, we discover that a lighter model and full-resolution input is much more helpful than having an accurate but complex model that needs to downsize input resolution. Second, MTPD further improves over the lightweight model. (Table 14 also mentions 'Tesla V100 GPU' for streaming accuracy experiments).
Software Dependencies	No	We implement detectors and their distillation using the MMDetection codebase (Chen et al., 2019b). While MMDetection is mentioned, no specific version number for it or other software libraries (e.g., Python, PyTorch) is provided.
Experiment Setup	Yes	We train on 8 GPUs for 12 epochs for each distillation. For MS COCO, we use the standard input resolution of 1, 333 800, with each GPU hosting 2 images...We use an initial learning rate of 0.01 (for Retina Net students) or 0.02 (for Mask R-CNN students). We use stochastic gradient descent and a momentum of 0.9. For the simple feature-matching loss (see Section 3.1), we perform a grid search over the hyper-parameter λ. While the optimal values are dependent on the architectures of the teacher and student models, we ﬁnd that the performance is not very sensitive to λ between 0.3 and 0.8. We set λ = 0.5 for Retina Net students and λ = 0.8 for Mask R-CNN students.