reproducibilityindex.ai

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Authors: Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results on five datasets (i.e., CIFAR, SVHN, STL-10, Image Net and Cityscapes) validate that Info Pro is capable of achieving competitive performance with less than 40% memory footprint compared to E2E training, while allowing using training data with higher-resolution or larger batch sizes under the same GPU memory constraint.
Researcher Affiliation	Academia	Yulin Wang, Zanlin Ni, Shiji Song, Le Yang & Gao Huang Department of Automation, BNRist, Tsinghua University, Beijing, China, {wang-yl19, nzl17, yangle15}@mails.tsinghua.edu.cn {shijis, gaohuang}@tsinghua.edu.cn
Pseudocode	No	The paper describes algorithms and methods in prose and mathematical formulations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is available at: https://github.com/blackfeather-wang/Info Pro-Pytorch.
Open Datasets	Yes	Our experiments are based on ﬁve widely used datasets (i.e., CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011), Image Net (Deng et al., 2009) and Cityscapes (Cordts et al., 2016)).
Dataset Splits	Yes	The Cityscapes dataset (Cordts et al., 2016) contains 5,000 1024 2048 pixel-level finely annotated images (2,975/500/1,525 for training, validation and testing)...
Hardware Specification	Yes	Results of training Res Net-110 on a single Nvidia Titan Xp GPU are reported. We use 8 Tesla V100 GPUs for training. 2 Nvidia Ge Force RTX 3090 GPUs are used for training.
Software Dependencies	No	The paper mentions 'Pytorch' in the code link 'Info Pro-Pytorch' and discusses 'SGD optimizer' and 'Adam optimizer'. However, it does not specify version numbers for PyTorch or any other software libraries, environments, or solvers used for the experiments.
Experiment Setup	Yes	The networks are trained using a SGD optimizer with a Nesterov momentum of 0.9 for 160 epochs. The L2 weight decay ratio is set to 1e-4. For Res Nets, the batch size is set to 1024 and 128 for CIFAR-10/SVHN and STL-10, associated with an initial learning rate of 0.8 and 0.1, respectively. For Dense Nets, we use a batch size of 256 and an initial learning rate of 0.2. The cosine learning rate annealing is adopted.