ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Authors: Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, Joseph Gonzalez
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Act NN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, Act NN compresses the activation to 2 bits on average, with negligible accuracy loss. Act NN reduces the memory footprint of the activation by 12 , and it enables training with a 6.6 to 14 larger batch size. |
| Researcher Affiliation | Academia | 1UC Berkeley. Correspondence to: Jianfei Chen <jianfeic@berkeley.edu>, Lianmin Zheng <lmzheng@berkeley.edu>. |
| Pseudocode | Yes | Figure 4. Pseudo code for activation compressed layers. |
| Open Source Code | Yes | We implement Act NN as a Py Torch library at https://github.com/ucbrise/actnn. |
| Open Datasets | Yes | Res Net-56 (He et al., 2016b) on CIFAR-100 (Krizhevsky & Hinton, 2009), and Res Net-50 (He et al., 2016a) on Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions using CIFAR-100 and ImageNet for experiments, which are standard datasets, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) used for reproducibility in the main text. |
| Hardware Specification | Yes | The experiments are done with Py Torch v1.7 and an AWS g4dn.4xlarge instance, which has a 16GB NVIDIA T4 GPU and 64GB CPU memory. |
| Software Dependencies | Yes | We implement Act NN as a library based on Py Torch (Paszke et al., 2019). The experiments are done with Py Torch v1.7... |
| Experiment Setup | Yes | The average number of bits is varied between {1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4}. Each configuration is repeated by 5 times on CIFAR-100, and by once on Image Net. ... Act NN can train the models with significantly larger batch size per GPU, and achieve good validation accuracy using only 2-bit activations. |