reproducibilityindex.ai

Backprop with Approximate Activations for Memory-efficient Network Training

Authors: Ayan Chakrabarti, Benjamin Moseley

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on CIFAR-10, CIFAR-100, and Image Net show that our method yields performance close to exact training, while storing activations compactly with as low as 4-bit precision.
Researcher Affiliation	Academia	Ayan Chakrabarti Washington University in St. Louis 1 Brookings Dr., St. Louis, MO 63130 ayan@wustl.edu Benjamin Moseley Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 moseleyb@andrew.cmu.edu
Pseudocode	No	The paper describes the computational steps using mathematical equations and descriptive text, but it does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Our reference implementation is available at http://projects.ayanc.org/blpa/.
Open Datasets	Yes	Experiments on CIFAR-10, CIFAR-100, and Image Net show that our method yields performance close to exact training... We begin with comparisons on 164-layer pre-activation residual networks [9] on CIFAR-10 and CIFAR-100 [13]... For Image Net [18], we train models with a 152-layer residual architecture...
Dataset Splits	Yes	For Image Net, ... Table 1 reports top-5 validation accuracy (using 10 crops at a scale of 256) for models trained using exact computation, and our approach with K = 8 and K = 4 bit approximations.
Hardware Specification	Yes	For the CIFAR experiments, we were able to fit the full 128-size batch on a single 1080Ti GPU... caused an out-of-memory error on a 1080Ti GPU.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We train the network for 64k iterations with a batch size of 128, momentum of 0.9, and weight decay of 2e-4. Following [9], the learning rate is set to 1e-2 for the first 400 iterations, then increased to 1e-1, and dropped by a factor of 10 at 32k and 48k iterations. We use standard data-augmentation with random translation and horizontal flips.