DIVISION: Memory Efficient Training via Dual Activation Precision
Authors: Guanchu Wang, Zirui Liu, Zhimeng Jiang, Ninghao Liu, Na Zou, Xia Hu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show DIVISION has better comprehensive performance than stateof-the-art methods, including over 10 compression of activation maps and competitive training throughput, without loss of model accuracy. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Rice University 2Department of Computer Science and Engineering, Texas A&M University 3Department of Computer Science, University of Georgia 4Department of Engineering Technology, Texas A&M University. |
| Pseudocode | Yes | Algorithm 1 Mini-batch updating of DIVISION" and "Algorithms 2, 3, 4 and 5 to compresse the activation map of a Max-Pooling layer, Average Pooling layer, Relu activation and Dropout operator, respectively. |
| Open Source Code | Yes | The source code is available at https://github. com/guanchuwang/division. |
| Open Datasets | Yes | We consider CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets in our experiments. |
| Dataset Splits | Yes | Our reproduced validating accuracy on the Image Net dataset is consistent with the official results of torchvision" and "We consider CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets in our experiments. |
| Hardware Specification | Yes | GPU model Nvidia-RTX3090 |
| Software Dependencies | Yes | CUDA Version 12.0 |
| Experiment Setup | Yes | Table 10: Hyper-parameter setting." which includes "Epoch 100", "Batch-size 256", "Initial LR 0.1", "LR scheduler Cos LR", "Weight-decay 0.0005", "Optimizer SGD", "Momentum 0.9", "Block-size B 8", "Bit-width Q 2". |