Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Backprop with Approximate Activations for Memory-efficient Network Training
Authors: Ayan Chakrabarti, Benjamin Moseley
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on CIFAR-10, CIFAR-100, and Image Net show that our method yields performance close to exact training, while storing activations compactly with as low as 4-bit precision. |
| Researcher Affiliation | Academia | Ayan Chakrabarti Washington University in St. Louis 1 Brookings Dr., St. Louis, MO 63130 EMAIL Benjamin Moseley Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 EMAIL |
| Pseudocode | No | The paper describes the computational steps using mathematical equations and descriptive text, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our reference implementation is available at http://projects.ayanc.org/blpa/. |
| Open Datasets | Yes | Experiments on CIFAR-10, CIFAR-100, and Image Net show that our method yields performance close to exact training... We begin with comparisons on 164-layer pre-activation residual networks [9] on CIFAR-10 and CIFAR-100 [13]... For Image Net [18], we train models with a 152-layer residual architecture... |
| Dataset Splits | Yes | For Image Net, ... Table 1 reports top-5 validation accuracy (using 10 crops at a scale of 256) for models trained using exact computation, and our approach with K = 8 and K = 4 bit approximations. |
| Hardware Specification | Yes | For the CIFAR experiments, we were able to fit the full 128-size batch on a single 1080Ti GPU... caused an out-of-memory error on a 1080Ti GPU. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We train the network for 64k iterations with a batch size of 128, momentum of 0.9, and weight decay of 2e-4. Following [9], the learning rate is set to 1e-2 for the first 400 iterations, then increased to 1e-1, and dropped by a factor of 10 at 32k and 48k iterations. We use standard data-augmentation with random translation and horizontal flips. |