ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Authors: Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, Joseph Gonzalez

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Act NN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, Act NN compresses the activation to 2 bits on average, with negligible accuracy loss. Act NN reduces the memory footprint of the activation by 12 , and it enables training with a 6.6 to 14 larger batch size.
Researcher Affiliation Academia 1UC Berkeley. Correspondence to: Jianfei Chen <jianfeic@berkeley.edu>, Lianmin Zheng <lmzheng@berkeley.edu>.
Pseudocode Yes Figure 4. Pseudo code for activation compressed layers.
Open Source Code Yes We implement Act NN as a Py Torch library at https://github.com/ucbrise/actnn.
Open Datasets Yes Res Net-56 (He et al., 2016b) on CIFAR-100 (Krizhevsky & Hinton, 2009), and Res Net-50 (He et al., 2016a) on Image Net (Deng et al., 2009).
Dataset Splits No The paper mentions using CIFAR-100 and ImageNet for experiments, which are standard datasets, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) used for reproducibility in the main text.
Hardware Specification Yes The experiments are done with Py Torch v1.7 and an AWS g4dn.4xlarge instance, which has a 16GB NVIDIA T4 GPU and 64GB CPU memory.
Software Dependencies Yes We implement Act NN as a library based on Py Torch (Paszke et al., 2019). The experiments are done with Py Torch v1.7...
Experiment Setup Yes The average number of bits is varied between {1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4}. Each configuration is repeated by 5 times on CIFAR-100, and by once on Image Net. ... Act NN can train the models with significantly larger batch size per GPU, and achieve good validation accuracy using only 2-bit activations.