Learning with Auxiliary Activation for Memory-Efficient Training

Authors: Sunghyeon Woo, Dongsuk Jeon

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results confirm that the proposed learning rule achieves competitive performance compared to backpropagation in various models such as Res Net, Transformer, BERT, Vi T, and MLP-Mixer.
Researcher Affiliation Academia Sunghyeon Woo, Dongsuk Jeon Seoul National University, Seoul, Korea {wsh0917,djeon1}@snu.ac.kr
Pseudocode No The paper provides mathematical equations and figures to describe the process but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Furthermore, we support code to reproduce our results as follows: https://github.com/Woo Sunghyeon/Auxiliary Activation Learning.
Open Datasets Yes The CIFAR-10 dataset (46) consists of 50,000 training images and 10,000 test images which are 32x32 RGB images for 10-class image classification. Likewise, the CIFAR-100 dataset (46) includes 50,000 training images and 10,000 test images with the same resolution for 100-class image classification. The Tiny-Image Net dataset (47) consists of images of 200 classes and each class has 500 images for training. It also contains 10,000 test images. All images included in Tiny Image Net are selected from Image Net and downsized to 64x64.
Dataset Splits Yes The CIFAR-10 dataset (46) consists of 50,000 training images and 10,000 test images which are 32x32 RGB images for 10-class image classification. Likewise, the CIFAR-100 dataset (46) includes 50,000 training images and 10,000 test images with the same resolution for 100-class image classification. Tiny Image Net... consists of images of 200 classes and each class has 500 images for training. It also contains 10,000 test images. Image Net... consists of 1,281,167 training images, 50,000 validation images, and 100,000 test images... The IWSLT 2016 dataset (49)... as a training set while the validation set consists of 13 talks, 1.1K sentences, and 21K tokens. The Multi-Genre Natural Language Inference (Multi NLI) dataset (51)... The dataset is divided into 392702 train sets, 9815 validation matched sets which are subsets of train sets, and 9832 validation mismatched sets which are included in train sets.
Hardware Specification Yes In all experiments, we used Nvidia Ge Force RTX 3090 GPUs. While one GPU was utilized to perform most of the experiments, we used six GPUs to train Res Net on Image Net to reduce training time.
Software Dependencies No The paper mentions using 'Py Torch' but does not specify its version number or versions for other key software dependencies like CUDA.
Experiment Setup Yes For training Res Net-18 from scratch on CIFAR-10, CIFAR-100, and Tiny Image Net, we set the batch size and the total number of epochs to 128 and 200, respectively. We applied stochastic gradient descent with momentum (53) along with weight decaying (46) to our experiments. The momentum and weight decay rate was set to 0.9 and 1e-4, respectively. The learning rate of the layers except for ASA layers was scheduled by cosine annealing (54) with a 0.1 initial learning rate during 200 epochs. In comparison, we used a 100x higher learning rate for ASA layers to make the magnitude of weight updates comparable to those of other layers. We also set ϵ in equations (15) and (16) to 0.01.