AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Authors: Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar C. Tatikonda, Nicha Dvornek, Xenophon Papademetris, James Duncan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate Ada Belief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. |
| Researcher Affiliation | Academia | Juntang Zhuang1; Tommy Tang2; Yifan Ding3; Sekhar Tatikonda1; Nicha Dvornek1; Xenophon Papademetris1; James S. Duncan1 1 Yale University; 2 University of Illinois at Urbana-Champaign; 3 University of Central Florida |
| Pseudocode | Yes | Algorithm 1: Adam Optimizer" and "Algorithm 2: Ada Belief Optimizer |
| Open Source Code | Yes | Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer |
| Open Datasets | Yes | CNNs on image classification We experiment with VGG11, Res Net34 and Dense Net121 on Cifar10 and Cifar100 dataset... We then train a Res Net18 on Image Net... LSTM on language modeling We experiment with LSTM on the Penn Tree Bank dataset [34]". These are common public datasets. |
| Dataset Splits | Yes | We then train a Res Net18 on Image Net, and report the accuracy on the validation set in Table 2." and "report the mean and standard deviation of test-set accuracy (under optimal hyperparameters) for 3 runs with random initialization. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions models and optimizers but does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or other libraries. |
| Experiment Setup | Yes | Ada Belief: We use the default parameters of Adam: β1 = 0.9, β2 = 0.999, ϵ = 10 8, α = 10 3. SGD, Fromage: We set the momentum as 0.9... We search learning rate among {10.0, 1.0, 0.1, 0.01, 0.001}. |