Learning with Auxiliary Activation for Memory-Efficient Training
Authors: Sunghyeon Woo, Dongsuk Jeon
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results confirm that the proposed learning rule achieves competitive performance compared to backpropagation in various models such as Res Net, Transformer, BERT, Vi T, and MLP-Mixer. |
| Researcher Affiliation | Academia | Sunghyeon Woo, Dongsuk Jeon Seoul National University, Seoul, Korea {wsh0917,djeon1}@snu.ac.kr |
| Pseudocode | No | The paper provides mathematical equations and figures to describe the process but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Furthermore, we support code to reproduce our results as follows: https://github.com/Woo Sunghyeon/Auxiliary Activation Learning. |
| Open Datasets | Yes | The CIFAR-10 dataset (46) consists of 50,000 training images and 10,000 test images which are 32x32 RGB images for 10-class image classification. Likewise, the CIFAR-100 dataset (46) includes 50,000 training images and 10,000 test images with the same resolution for 100-class image classification. The Tiny-Image Net dataset (47) consists of images of 200 classes and each class has 500 images for training. It also contains 10,000 test images. All images included in Tiny Image Net are selected from Image Net and downsized to 64x64. |
| Dataset Splits | Yes | The CIFAR-10 dataset (46) consists of 50,000 training images and 10,000 test images which are 32x32 RGB images for 10-class image classification. Likewise, the CIFAR-100 dataset (46) includes 50,000 training images and 10,000 test images with the same resolution for 100-class image classification. Tiny Image Net... consists of images of 200 classes and each class has 500 images for training. It also contains 10,000 test images. Image Net... consists of 1,281,167 training images, 50,000 validation images, and 100,000 test images... The IWSLT 2016 dataset (49)... as a training set while the validation set consists of 13 talks, 1.1K sentences, and 21K tokens. The Multi-Genre Natural Language Inference (Multi NLI) dataset (51)... The dataset is divided into 392702 train sets, 9815 validation matched sets which are subsets of train sets, and 9832 validation mismatched sets which are included in train sets. |
| Hardware Specification | Yes | In all experiments, we used Nvidia Ge Force RTX 3090 GPUs. While one GPU was utilized to perform most of the experiments, we used six GPUs to train Res Net on Image Net to reduce training time. |
| Software Dependencies | No | The paper mentions using 'Py Torch' but does not specify its version number or versions for other key software dependencies like CUDA. |
| Experiment Setup | Yes | For training Res Net-18 from scratch on CIFAR-10, CIFAR-100, and Tiny Image Net, we set the batch size and the total number of epochs to 128 and 200, respectively. We applied stochastic gradient descent with momentum (53) along with weight decaying (46) to our experiments. The momentum and weight decay rate was set to 0.9 and 1e-4, respectively. The learning rate of the layers except for ASA layers was scheduled by cosine annealing (54) with a 0.1 initial learning rate during 200 epochs. In comparison, we used a 100x higher learning rate for ASA layers to make the magnitude of weight updates comparable to those of other layers. We also set ϵ in equations (15) and (16) to 0.01. |