AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Authors: Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar C. Tatikonda, Nicha Dvornek, Xenophon Papademetris, James Duncan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate Ada Belief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling.
Researcher Affiliation Academia Juntang Zhuang1; Tommy Tang2; Yifan Ding3; Sekhar Tatikonda1; Nicha Dvornek1; Xenophon Papademetris1; James S. Duncan1 1 Yale University; 2 University of Illinois at Urbana-Champaign; 3 University of Central Florida
Pseudocode Yes Algorithm 1: Adam Optimizer" and "Algorithm 2: Ada Belief Optimizer
Open Source Code Yes Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer
Open Datasets Yes CNNs on image classification We experiment with VGG11, Res Net34 and Dense Net121 on Cifar10 and Cifar100 dataset... We then train a Res Net18 on Image Net... LSTM on language modeling We experiment with LSTM on the Penn Tree Bank dataset [34]". These are common public datasets.
Dataset Splits Yes We then train a Res Net18 on Image Net, and report the accuracy on the validation set in Table 2." and "report the mean and standard deviation of test-set accuracy (under optimal hyperparameters) for 3 runs with random initialization.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions models and optimizers but does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or other libraries.
Experiment Setup Yes Ada Belief: We use the default parameters of Adam: β1 = 0.9, β2 = 0.999, ϵ = 10 8, α = 10 3. SGD, Fromage: We set the momentum as 0.9... We search learning rate among {10.0, 1.0, 0.1, 0.01, 0.001}.