Better Supervisory Signals by Observing Learning Paths

Authors: Yi Ren, Shangmin Guo, Danica J. Sutherland

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To further support this hypothesis, we conduct experiments on a synthetic Gaussian problem (Figure 1 (a); details in Appendix C), where we can easily calculate p (y | x) for each sample.
Researcher Affiliation Academia Yi Ren UBC renyi.joshua@gmail.com Shangmin Guo University of Edinburgh s.guo@ed.ac.uk Danica J. Sutherland UBC and Amii dsuth@cs.ubc.ca
Pseudocode Yes Algorithm 1: Filter-KD.
Open Source Code Yes Code, including the experiments producing the figures and a Filter-KD implementation, is available at https://github.com/Joshua-Ren/better_supervisory_signal.
Open Datasets Yes The CIFAR10H dataset (Peterson et al., 2019) is one attempt at a different ptar, using multiple human annotators to estimate ptar. ... We visualize the learning path of data points while training a Res Net18 (He et al., 2016) on CIFAR10 ... The experiments are conducted on CIFAR (Figure 7) and Tiny Image Net (Table 1)
Dataset Splits Yes We early-stop the student s training in all settings. ... ESKD uses a teacher stopped early based on validation accuracy. ... Check the early stopping criterion with the help of a validation set. ... make a train/valid/test split with ratio [0.05 0.05, 0.9]
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments. It only mentions general computing resources like 'West Grid, and Compute Canada'.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes We train an MLP with 3 hidden layers, each with 128 hidden units and ReLU activations. ... we visualize the learning path of data points while training a Res Net18 (He et al., 2016) on CIFAR10 for 200 epochs. ... we focus on self-distillation and a fixed temperature τ = 1 ... α controls the cut-off frequency of low-pass filter (0.05 here).