Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Authors: Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical results on five datasets (i.e., CIFAR, SVHN, STL-10, Image Net and Cityscapes) validate that Info Pro is capable of achieving competitive performance with less than 40% memory footprint compared to E2E training, while allowing using training data with higher-resolution or larger batch sizes under the same GPU memory constraint.
Researcher Affiliation Academia Yulin Wang, Zanlin Ni, Shiji Song, Le Yang & Gao Huang Department of Automation, BNRist, Tsinghua University, Beijing, China, {wang-yl19, nzl17, yangle15}@mails.tsinghua.edu.cn {shijis, gaohuang}@tsinghua.edu.cn
Pseudocode No The paper describes algorithms and methods in prose and mathematical formulations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is available at: https://github.com/blackfeather-wang/Info Pro-Pytorch.
Open Datasets Yes Our experiments are based on five widely used datasets (i.e., CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011), Image Net (Deng et al., 2009) and Cityscapes (Cordts et al., 2016)).
Dataset Splits Yes The Cityscapes dataset (Cordts et al., 2016) contains 5,000 1024 2048 pixel-level finely annotated images (2,975/500/1,525 for training, validation and testing)...
Hardware Specification Yes Results of training Res Net-110 on a single Nvidia Titan Xp GPU are reported. We use 8 Tesla V100 GPUs for training. 2 Nvidia Ge Force RTX 3090 GPUs are used for training.
Software Dependencies No The paper mentions 'Pytorch' in the code link 'Info Pro-Pytorch' and discusses 'SGD optimizer' and 'Adam optimizer'. However, it does not specify version numbers for PyTorch or any other software libraries, environments, or solvers used for the experiments.
Experiment Setup Yes The networks are trained using a SGD optimizer with a Nesterov momentum of 0.9 for 160 epochs. The L2 weight decay ratio is set to 1e-4. For Res Nets, the batch size is set to 1024 and 128 for CIFAR-10/SVHN and STL-10, associated with an initial learning rate of 0.8 and 0.1, respectively. For Dense Nets, we use a batch size of 256 and an initial learning rate of 0.2. The cosine learning rate annealing is adopted.