Robust Knowledge Transfer via Hybrid Forward on the Teacher-Student Model

Authors: Liangchen Song, Jialian Wu, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan2558-2566

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method on a variety of tasks, e.g, model compression, segmentation, and detection, under a variety of knowledge transfer settings. ... To validate the effectiveness of our proposed method, we conduct experiments on four different settings, as introduced in the introduction of the paper (Fig. 1).
Researcher Affiliation Collaboration 1University at Buffalo 2Horizon Robotics 3Google {lsong8,jialianw,jsyuan}@buffalo.edu, m-yang4@u.northwestern.edu, qian01.zhang@horizon.ai, liyu@google.com
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We begin our experiments with the gap between the teacher and the student: the model gap. Such a setting is also known as the distillation based model compression. In the experiments, a teacher network that is well-trained on Image Net (Deng et al. 2009) will be used to guide a shallower student network. ... The GTA5 dataset has 24966 images and we randomly select 500 images out as the validation set for training the teacher network. Apart from the above, to better investigate on which gap is more challenging, we employ a multi-task setting on the City Scapes dataset. ... The dataset for the student network is City Persons (Zhang, Benenson, and Schiele 2017), which uses the images from City Scapes and the pedestrian are manually relabeled.
Dataset Splits Yes The GTA5 dataset has 24966 images and we randomly select 500 images out as the validation set for training the teacher network. ... Since we are interested in whether the knowledge from the teacher can help the student, we first present results on the City Scapes validation set with the above two teachers. ... In Tab. 3, we show the results on the GTA5 validation set, which is split previously to acquire a teacher network on the GTA5. ... The results on City Persons validation set are presented in Tab. 4.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments. It only mentions general concepts like 'mobile phone' or 'server' in a theoretical context.
Software Dependencies No The paper mentions 'Torchvision' and 'PyTorch' (implicitly, as it's common for deep learning), but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For training hyper-parameters, we use the same parameters as (Heo et al. 2019a): Batch size is set to 256; Learning rate is initialized with 0.1 and decay by 0.1 every 30 epochs. ... we use the same training hyper-parameters: batch size of 8, 40000 iterations and learning rate starting from 1e-3 with polynomial decay.