reproducibilityindex.ai

AC-GC: Lossy Activation Compression with Guaranteed Convergence

Authors: R David Evans, Tor Aamodt

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine activation compression by modifying the Chainer framework [53] to compress and decompress activations during training. We measure compression rates every 100 iterations, and otherwise perform paired compression/decompression to maintain the highest performance for our experiments. We focus our analysis on CNNs with image and text datasets, as they have large activation memory requirements, but avoid the largest networks [22, 52] due to limited resources. We create a performance implementation based off Chen et al. [7] to measure throughput. For Image Net [11], CIFAR10 [2] and Div2K [1], we use SGD and 0.9 momentum for VGG16 [50], Res Nets (RN18 and RN50) [20], Wide Res Net (WRN) [59], and VDSR [29]. IMDB [39] and Text Copy [4] are trained using ADAM with CNN [53], RNN [53], and transformer heads [54].
Researcher Affiliation	Academia	R. David Evans Dept. of Electrical and Computer Engineering University of British Columbia Vancouver, BC V6T 1Z4 rdevans@ece.ubc.ca Tor M. Aamodt Dept. of Electrical and Computer Engineering University of British Columbia Vancouver, BC V6T 1Z4 aamodt@ece.ubc.ca
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code is available https://github.com/rdevans0/acgc.
Open Datasets	Yes	For Image Net [11], CIFAR10 [2] and Div2K [1], we use SGD and 0.9 momentum for VGG16 [50], Res Nets (RN18 and RN50) [20], Wide Res Net (WRN) [59], and VDSR [29]. IMDB [39] and Text Copy [4] are trained using ADAM with CNN [53], RNN [53], and transformer heads [54]. All image datasets are augmented with random sizing, ﬂip, and crop, as well as whitening and PCA for Image Net [30], and 8 8 cutout for CIFAR10 [12]. Learning rates, batch sizes, and epochs are 0.05, 128, 300 (CIFAR10, [49]), 0.1, 64, 105 (Image Net, [58]), 0.1, 32, 110 (Div2K, grid search), 2.0, 64, 100 (Text Copy, [4]), and 0.001, 64, 20 (IMDB, [53]).
Dataset Splits	No	The paper does not explicitly provide percentages or counts for training, test, and validation dataset splits. While standard datasets were used, the specific split information is not stated in the text.
Hardware Specification	Yes	Table 2: Trained using 900 GPU-days (RTX 2080 Ti).
Software Dependencies	No	The paper mentions modifying the 'Chainer framework [53]' but does not specify a version number for Chainer or any other ancillary software.
Experiment Setup	Yes	Learning rates, batch sizes, and epochs are 0.05, 128, 300 (CIFAR10, [49]), 0.1, 64, 105 (Image Net, [58]), 0.1, 32, 110 (Div2K, grid search), 2.0, 64, 100 (Text Copy, [4]), and 0.001, 64, 20 (IMDB, [53]). Unless otherwise stated, all experiments use e2 = 0.5, parameter estimates from the mean of a ten entry window, and a recalculation interval of 100 iterations.