AdaLoss: A Computationally-Efficient and Provably Convergent Adaptive Gradient Method

Authors: Xiaoxia Wu, Yuege Xie, Simon Shaolei Du, Rachel Ward8691-8699

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We numerically verify the theoretical results and extend the scope of the numerical experiments by considering applications in LSTM models for text clarification and policy gradients for control problems.
Researcher Affiliation Collaboration Xiaoxia Wu1*, Yuege Xie2, Simon Shaolei Du3, and Rachel Ward2 1 Microsoft 2 The University of Texas at Austin 3 The ss University of Washington
Pseudocode Yes Algorithm 1: Ada Loss Algorithm; Algorithm 2: Adam Loss
Open Source Code Yes Code is available at github.com/willway1023yx/adaloss
Open Datasets Yes We fine-tune the pretrained model (Vi T-S/16) on CIFAR100 (with 45k training and 5k validation images) over 10 epochs, and show test accuracy (with mean and std over three independent runs) of the best model chosen by validation data, on 10k test images.
Dataset Splits Yes We fine-tune the pretrained model (Vi T-S/16) on CIFAR100 (with 45k training and 5k validation images) over 10 epochs, and show test accuracy (with mean and std over three independent runs) of the best model chosen by validation data, on 10k test images.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions types of models and methods (e.g., LSTM, policy gradient methods) but does not provide specific software dependencies or library versions needed to replicate the experiment.
Experiment Setup Yes For fine-tuning experiments, we set η = 0.1. For Adam Loss, we set α = 1. Adam Algorithm 2 provides β1 = 0.9, β2 = 0.99. CIFAR100 fine-tuning was performed over 10 epochs.