Backpropagation-Free Deep Learning with Recursive Local Representation Alignment

Authors: Alexander G. Ororbia, Ankur Mali, Daniel Kifer, C. Lee Giles

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with residual networks on CIFAR-10 and the large benchmark, Image Net, show that our algorithm generalizes as well as backprop while converging sooner due to weight updates that are parallelizable and computationally less demanding. This is empirical evidence that a backprop-free algorithm can scale up to larger datasets.
Researcher Affiliation Academia 1 Rochester Institute of Technology, Rochester, NY 14623, USA 2 University of South Florida, Tampa, FL 33620, USA 3 The Pennsylvania State University, State College, PA 16801, USA
Pseudocode Yes Algorithm 1: Rec-LRA (depth 2) for fΘ(x) w/ residual gap g.
Open Source Code Yes We made our code available at https://github.com/alexororbia/rec_lra (from appendix linked in footnote 3 on page 3 of the main paper).
Open Datasets Yes MNIST & Fashion MNIST: This dataset contains 28x28 images with gray-scale pixel values, i.e., range is [0, 255]. ... Fashion MNIST (FMNIST) (Xiao, Rasul, and Vollgraf 2017)..., CIFAR-10: The CIFAR-10 dataset has 50,000 training and 10,000 test images, across 10 categories. Images are of size 32x32 pixels., and Image Net: The large-scale benchmark Image Net (Russakovsky et al. 2015), specifically the ILSVRC-2010 subset, contains over 1.2 million images...
Dataset Splits Yes For both [MNIST/FMNIST], training had 60000 samples, testing had 10000, and 2000 validation samples were drawn from the training set. and CIFAR-10: The CIFAR-10 dataset has 50,000 training and 10,000 test images, across 10 categories. Images are of size 32x32 pixels. 5,000 training samples were set aside to measure validation metrics.
Hardware Specification Yes Notably, in terms of total training run-time over 90 epochs using a small set of 8 V100 GPUs, the backprop Res Net took 3 hours and 45 minutes (min) to train (speed was about 2.5-2.7 min/epoch) while rec-LRA took 2.127 min/epoch, training over the course of 3 hours and 12 min.
Software Dependencies Yes Python 3.7 and Pytorch 1.8.1 with CUDA 11.1 (stated in the Appendix, which is referenced from the main paper for 'software details').
Experiment Setup Yes For the rec-LRA results, we report 4 variations (all had 5 layers, 256 units each), each using a different activation function. and The initial learning rate was set to 10^-2 and a polynomial decay scheme was used to decay the learning rate up to 1e-5 (with a warm-up schedule set to 40). We trained models with varying batch sizes and report the validation error in Table 2 (b)...