GACT: Activation Compressed Training for Generic Network Architectures

Authors: Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that GACT can reduce activation memory by up to 8.1 , enabling training with a 24.7 larger batch size on the same GPU.
Researcher Affiliation Collaboration 1UC Berkeley 2Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua-Bosch Joint Center for ML, BNRist Center, State Key Lab for Intell. Tech. & Sys., Tsinghua University 3ICSI 4LBNL.
Pseudocode Yes Algorithm 1 Numerical algorithm for computing cl(h, θ). Require: A gradient evaluation function g( ; θ) Require: A series of L + 1 random seeds (rl)L+1 l=1 . Require: Any compression scheme b = (bl)L l=1 l, seed Q(l) with rl g0 g(Qb(h); θ) {First iteration} l, seed Q(l) with rl seed Q(l) with r L+1 g1 g(Qb(h); θ) {Second iteration, with another seed} Return 1 2 g0 g1 2 /S(bl)
Open Source Code No The paper implements GACT as a Py Torch library and shows a usage example (Figure 2) but does not provide an explicit statement or link for public code availability.
Open Datasets Yes We conduct experiments on four node classification datasets with standard splits, including Flickr, Reddit, Yelp from Graph SAINT (Zeng et al., 2019), and ogbn-arxiv from Open Graph Benchmark (OGB) (Hu et al., 2020).
Dataset Splits No We report accuracy on validation sets (Div. indicates diverge) and the compression rate of context tensors (numbers in brackets) for both tasks. While validation sets are mentioned, explicit training/validation/test split percentages or sample counts are not provided in the paper's main text.
Hardware Specification Yes We implement the benchmark with Py Torch 1.10 and measure the memory saving and overhead of GACT on an AWS g4dn.4xlarge instance, which has a 16GB NVIDIA T4 GPU and 64GB CPU memory.
Software Dependencies Yes We implement the benchmark with Py Torch 1.10 and measure the memory saving and overhead of GACT on an AWS g4dn.4xlarge instance...
Experiment Setup Yes All experiments are run with the same learning rate as the full precision.