DIVISION: Memory Efficient Training via Dual Activation Precision

Authors: Guanchu Wang, Zirui Liu, Zhimeng Jiang, Ninghao Liu, Na Zou, Xia Hu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show DIVISION has better comprehensive performance than stateof-the-art methods, including over 10 compression of activation maps and competitive training throughput, without loss of model accuracy.
Researcher Affiliation Academia 1Department of Computer Science, Rice University 2Department of Computer Science and Engineering, Texas A&M University 3Department of Computer Science, University of Georgia 4Department of Engineering Technology, Texas A&M University.
Pseudocode Yes Algorithm 1 Mini-batch updating of DIVISION" and "Algorithms 2, 3, 4 and 5 to compresse the activation map of a Max-Pooling layer, Average Pooling layer, Relu activation and Dropout operator, respectively.
Open Source Code Yes The source code is available at https://github. com/guanchuwang/division.
Open Datasets Yes We consider CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets in our experiments.
Dataset Splits Yes Our reproduced validating accuracy on the Image Net dataset is consistent with the official results of torchvision" and "We consider CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets in our experiments.
Hardware Specification Yes GPU model Nvidia-RTX3090
Software Dependencies Yes CUDA Version 12.0
Experiment Setup Yes Table 10: Hyper-parameter setting." which includes "Epoch 100", "Batch-size 256", "Initial LR 0.1", "LR scheduler Cos LR", "Weight-decay 0.0005", "Optimizer SGD", "Momentum 0.9", "Block-size B 8", "Bit-width Q 2".