Revisiting Locally Supervised Learning: an Alternative to End-to-end Training
Authors: Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical results on five datasets (i.e., CIFAR, SVHN, STL-10, Image Net and Cityscapes) validate that Info Pro is capable of achieving competitive performance with less than 40% memory footprint compared to E2E training, while allowing using training data with higher-resolution or larger batch sizes under the same GPU memory constraint. |
| Researcher Affiliation | Academia | Yulin Wang, Zanlin Ni, Shiji Song, Le Yang & Gao Huang Department of Automation, BNRist, Tsinghua University, Beijing, China, {wang-yl19, nzl17, yangle15}@mails.tsinghua.edu.cn {shijis, gaohuang}@tsinghua.edu.cn |
| Pseudocode | No | The paper describes algorithms and methods in prose and mathematical formulations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/blackfeather-wang/Info Pro-Pytorch. |
| Open Datasets | Yes | Our experiments are based on five widely used datasets (i.e., CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011), Image Net (Deng et al., 2009) and Cityscapes (Cordts et al., 2016)). |
| Dataset Splits | Yes | The Cityscapes dataset (Cordts et al., 2016) contains 5,000 1024 2048 pixel-level finely annotated images (2,975/500/1,525 for training, validation and testing)... |
| Hardware Specification | Yes | Results of training Res Net-110 on a single Nvidia Titan Xp GPU are reported. We use 8 Tesla V100 GPUs for training. 2 Nvidia Ge Force RTX 3090 GPUs are used for training. |
| Software Dependencies | No | The paper mentions 'Pytorch' in the code link 'Info Pro-Pytorch' and discusses 'SGD optimizer' and 'Adam optimizer'. However, it does not specify version numbers for PyTorch or any other software libraries, environments, or solvers used for the experiments. |
| Experiment Setup | Yes | The networks are trained using a SGD optimizer with a Nesterov momentum of 0.9 for 160 epochs. The L2 weight decay ratio is set to 1e-4. For Res Nets, the batch size is set to 1024 and 128 for CIFAR-10/SVHN and STL-10, associated with an initial learning rate of 0.8 and 0.1, respectively. For Dense Nets, we use a batch size of 256 and an initial learning rate of 0.2. The cosine learning rate annealing is adopted. |