reproducibilityindex.ai

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Authors: Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, Joseph Gonzalez

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Act NN on mainstream computer vision models for classiﬁcation, detection, and segmentation tasks. On all these tasks, Act NN compresses the activation to 2 bits on average, with negligible accuracy loss. Act NN reduces the memory footprint of the activation by 12 , and it enables training with a 6.6 to 14 larger batch size.
Researcher Affiliation	Academia	1UC Berkeley. Correspondence to: Jianfei Chen <jianfeic@berkeley.edu>, Lianmin Zheng <lmzheng@berkeley.edu>.
Pseudocode	Yes	Figure 4. Pseudo code for activation compressed layers.
Open Source Code	Yes	We implement Act NN as a Py Torch library at https://github.com/ucbrise/actnn.
Open Datasets	Yes	Res Net-56 (He et al., 2016b) on CIFAR-100 (Krizhevsky & Hinton, 2009), and Res Net-50 (He et al., 2016a) on Image Net (Deng et al., 2009).
Dataset Splits	No	The paper mentions using CIFAR-100 and ImageNet for experiments, which are standard datasets, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) used for reproducibility in the main text.
Hardware Specification	Yes	The experiments are done with Py Torch v1.7 and an AWS g4dn.4xlarge instance, which has a 16GB NVIDIA T4 GPU and 64GB CPU memory.
Software Dependencies	Yes	We implement Act NN as a library based on Py Torch (Paszke et al., 2019). The experiments are done with Py Torch v1.7...
Experiment Setup	Yes	The average number of bits is varied between {1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4}. Each conﬁguration is repeated by 5 times on CIFAR-100, and by once on Image Net. ... Act NN can train the models with signiﬁcantly larger batch size per GPU, and achieve good validation accuracy using only 2-bit activations.