Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Authors: Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now conduct a comprehensive empirical study of our TGT framework in order to establish that TGT (i) leads to high accuracy in transferring knowledge in low data/long-tail regimes (Section 4.1); (ii) effectively increases sample size (Section 4.2); and (iii) has wide adaptability even to discrete data domains such as text classification (Section 4.3) and retrieval (Section 4.4).
Researcher Affiliation Industry Google Research and Deep Mind, New York, USA
Pseudocode No The paper describes the methods textually and with equations, but does not include any labeled 'Algorithm' or 'Pseudocode' blocks.
Open Source Code No The paper mentions using various pre-trained models and datasets from official repositories (e.g., 'We directly used teacher generator as Big Bi GAN Res Net-50 checkpoint from the official repository https://github.com/deepmind/deepmind-research/tree/master/ bigbigan.'), but it does not state that the code for the TGT framework itself is open-sourced or provide a link to its implementation.
Open Datasets Yes We evaluate TGT by training student models on three benchmark long-tail image classification datasets: Image Net-LT (Liu et al., 2019c), SUN-LT (Patterson & Hays, 2012), Places-LT (Liu et al., 2019c)
Dataset Splits No The paper mentions training on 'long-tail version of the datasets' and evaluating on 'balanced eval sets' or 'entire eval set' (Fig 2), and subsampling strategies (e.g., 'extremely sub-sampled version of Amazon-5 and Yelp-5 consisting of only 2.5k labeled examples'), but it does not explicitly provide the specific percentages or counts for training, validation, and test splits across all experiments, nor does it reference predefined standard splits consistently for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instance types used for running the experiments.
Software Dependencies No The paper mentions various software components and libraries used, such as 'tensorflow-datasets', 'SGD optimizer', 'ADAM optimizer', and references specific model checkpoints from 'official repository', but it does not provide specific version numbers for these software dependencies (e.g., 'tensorflow vX.Y.Z').
Experiment Setup Yes All hyper-parameters and grid are listed in table below: (Table 6) Hyper-param Num epochs, Optimizer, Schedule, Warm-up epochs, Peak learning rate, Batch size, Teacher labeler image size, Teacher generator image size, Student image size, Perturbation noise (σ), Gradient exploration Step size (η), Num steps.