Better Supervisory Signals by Observing Learning Paths
Authors: Yi Ren, Shangmin Guo, Danica J. Sutherland
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To further support this hypothesis, we conduct experiments on a synthetic Gaussian problem (Figure 1 (a); details in Appendix C), where we can easily calculate p (y | x) for each sample. |
| Researcher Affiliation | Academia | Yi Ren UBC renyi.joshua@gmail.com Shangmin Guo University of Edinburgh s.guo@ed.ac.uk Danica J. Sutherland UBC and Amii dsuth@cs.ubc.ca |
| Pseudocode | Yes | Algorithm 1: Filter-KD. |
| Open Source Code | Yes | Code, including the experiments producing the figures and a Filter-KD implementation, is available at https://github.com/Joshua-Ren/better_supervisory_signal. |
| Open Datasets | Yes | The CIFAR10H dataset (Peterson et al., 2019) is one attempt at a different ptar, using multiple human annotators to estimate ptar. ... We visualize the learning path of data points while training a Res Net18 (He et al., 2016) on CIFAR10 ... The experiments are conducted on CIFAR (Figure 7) and Tiny Image Net (Table 1) |
| Dataset Splits | Yes | We early-stop the student s training in all settings. ... ESKD uses a teacher stopped early based on validation accuracy. ... Check the early stopping criterion with the help of a validation set. ... make a train/valid/test split with ratio [0.05 0.05, 0.9] |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments. It only mentions general computing resources like 'West Grid, and Compute Canada'. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | We train an MLP with 3 hidden layers, each with 128 hidden units and ReLU activations. ... we visualize the learning path of data points while training a Res Net18 (He et al., 2016) on CIFAR10 for 200 epochs. ... we focus on self-distillation and a fixed temperature τ = 1 ... α controls the cut-off frequency of low-pass filter (0.05 here). |