Scaling Supervised Local Learning with Augmented Auxiliary Networks
Authors: Chenxiang Ma, Jibin Wu, Chenyang Si, KC Tan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and Image Net) demonstrate that Aug Local can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. |
| Researcher Affiliation | Academia | Chenxiang Ma1, Jibin Wu1 , Chenyang Si2, Kay Chen Tan1 1The Hong Kong Polytechnic University, Hong Kong SAR, China 2Nanyang Technological University, Singapore |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/Chenxiang MA/Aug Local. |
| Open Datasets | Yes | Our experiments are based on four widely used benchmark datasets (i.e., CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011), and Image Net (Deng et al., 2009)). |
| Dataset Splits | Yes | CIFAR-10 (Krizhevsky et al., 2009) dataset consists of 60K 32 32 colored images that are categorized into 10 classes with 50K images for training and 10K images for test. ... The standard split of 73,257 images for training and 26,032 images for test is adopted. ... Table 3: Results on the validation set of Image Net. |
| Hardware Specification | Yes | All experiments are conducted on a machine equipped with 10 NVIDIA RTX3090. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. The paper only mentions 'We re-implement all of these methods in Py Torch using their official implementations'. |
| Experiment Setup | Yes | For CIFAR-10, SVHN, and STL-10 experiments using Res Net-32 (He et al., 2016), Res Net-110 (He et al., 2016), and VGG19 (Simonyan & Zisserman, 2014), we use the SGD optimizer with a Nesterov momentum of 0.9 and the L2 weight decay factor of 1e-4. We adopt a batch size of 1024 on CIFAR-10 and SVHN and a batch size of 128 on STL10. We train the networks for 400 epochs, setting the initial learning rate to 0.8 for CIFAR-10/SVHN and 0.1 for STL-10, with the cosine annealing scheduler (Loshchilov & Hutter, 2019). For Image Net experiments, we train VGG13 (Simonyan & Zisserman, 2014) with an initial learning rate of 0.1 for 90 epochs, and train Res Net-34 (He et al., 2016) and Res Net-101 (He et al., 2016) with initial learning rates of 0.4 and 0.2 for 200 epochs, respectively. We set batch sizes of VGG13, Res Net-34, and Res Net-101 to 256, 1024, and 512, respectively. |