Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Authors: Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now conduct a comprehensive empirical study of our TGT framework in order to establish that TGT (i) leads to high accuracy in transferring knowledge in low data/long-tail regimes (Section 4.1); (ii) effectively increases sample size (Section 4.2); and (iii) has wide adaptability even to discrete data domains such as text classification (Section 4.3) and retrieval (Section 4.4). |
| Researcher Affiliation | Industry | Google Research and Deep Mind, New York, USA |
| Pseudocode | No | The paper describes the methods textually and with equations, but does not include any labeled 'Algorithm' or 'Pseudocode' blocks. |
| Open Source Code | No | The paper mentions using various pre-trained models and datasets from official repositories (e.g., 'We directly used teacher generator as Big Bi GAN Res Net-50 checkpoint from the official repository https://github.com/deepmind/deepmind-research/tree/master/ bigbigan.'), but it does not state that the code for the TGT framework itself is open-sourced or provide a link to its implementation. |
| Open Datasets | Yes | We evaluate TGT by training student models on three benchmark long-tail image classification datasets: Image Net-LT (Liu et al., 2019c), SUN-LT (Patterson & Hays, 2012), Places-LT (Liu et al., 2019c) |
| Dataset Splits | No | The paper mentions training on 'long-tail version of the datasets' and evaluating on 'balanced eval sets' or 'entire eval set' (Fig 2), and subsampling strategies (e.g., 'extremely sub-sampled version of Amazon-5 and Yelp-5 consisting of only 2.5k labeled examples'), but it does not explicitly provide the specific percentages or counts for training, validation, and test splits across all experiments, nor does it reference predefined standard splits consistently for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions various software components and libraries used, such as 'tensorflow-datasets', 'SGD optimizer', 'ADAM optimizer', and references specific model checkpoints from 'official repository', but it does not provide specific version numbers for these software dependencies (e.g., 'tensorflow vX.Y.Z'). |
| Experiment Setup | Yes | All hyper-parameters and grid are listed in table below: (Table 6) Hyper-param Num epochs, Optimizer, Schedule, Warm-up epochs, Peak learning rate, Batch size, Teacher labeler image size, Teacher generator image size, Student image size, Perturbation noise (σ), Gradient exploration Step size (η), Num steps. |