reproducibilityindex.ai

Optimizer Amalgamation

Authors: Tianshu Huang, Tianlong Chen, Sijia Liu, Shiyu Chang, Lisa Amini, Zhangyang Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present experiments showing the superiority of our amalgamated optimizer compared to its amalgamated components and learning to optimize baselines, and the efﬁcacy of our variance reducing perturbations. Our code and pre-trained models are publicly available at http://github.com/VITA-Group/Optimizer Amalgamation.
Researcher Affiliation	Collaboration	Tianshu Huang1,2, Tianlong Chen1, Sijia Liu3, Shiyu Chang4, Lisa Amini5, Zhangyang Wang1 1University of Texas at Austin, 2Carnegie Mellon University, 3Michigan State University, 4University of California, Santa Barbara, 5MIT-IBM Watson AI Lab, IBM Research
Pseudocode	Yes	Algorithm 1: Distillation by Truncated Back-propagation Algorithm 2: Adversarial Weight Perturbation for Truncated Back-propagation
Open Source Code	Yes	Our code and pre-trained models are publicly available at http://github.com/VITA-Group/Optimizer Amalgamation.
Open Datasets	Yes	All datasets were accessed using Tensor Flow Datasets and have a CC-BY 4.0 license. The MNIST dataset (Le Cun & Cortes, 2010) is used during training; the other datasets are, from most to least similar, are: FMNIST: Fashion MNIST (Xiao et al., 2017). SVHN: Street View House Numbers, cropped (Netzer et al., 2011). CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits	No	The selection criteria is the best validation loss after 5 epochs for the Train network on MNIST, which matches the meta-training settings of the amalgamated optimizer. No specific percentages or sample counts for training/validation splits were explicitly provided.
Hardware Specification	Yes	All experiments were run on single nodes with 4x Nvidia 1080ti GPUs, providing us with a metabatch size of 4 simultaneous optimizations.
Software Dependencies	No	The paper mentions using "Tensor Flow Datasets" but does not specify version numbers for TensorFlow or any other software libraries used, which is required for reproducibility.
Experiment Setup	Yes	The RNNProp amalgamation target was trained using truncated backpropagation though time with a constant truncation length of 100 steps and total unroll of up to 1000 steps and meta-optimized by Adam with a learning rate of 1 10 3. For our training process, we also apply random scaling (Lv et al., 2017) and curriculum learning (Chen et al., 2020a); more details about amalgamation training are provided in Appendix C.3. During training, a batch size of 128 is used except for the Small Batch evaluation, which has a batch size of 32. The SGD learning rate is ﬁxed at 0.01. Warmup: Instead of initializing each training optimizee with random weights, we ﬁrst apply 100 steps of SGD optimization as a warmup.