Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Simple Convergence Proof of Adam and Adagrad

Authors: Alexandre Défossez, Leon Bottou, Francis Bach, Nicolas Usunier

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our bounds with experimental results, both on toy and real life problems in Section 6. On Figure 1, we compare the effective dependency of the average squared norm of the gradient in the parameters α, β1 and β2 for Adam, when used on a toy task and CIFAR-10.
Researcher Affiliation Collaboration Alexandre Défossez EMAIL Meta AI Léon Bottou Meta AI Francis Bach INRIA / PSL Nicolas Usunier Meta AI
Pseudocode No The paper describes the adaptive methods (Adagrad and Adam) using mathematical equations and prose in Section 2.2, rather than structured pseudocode or an algorithm block.
Open Source Code No The paper does not contain any explicit statement about providing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We train a simple convolutional network (Gitman & Ginsburg, 2017) on the CIFAR-10 image classification dataset. ... 2https://www.cs.toronto.edu/~kriz/cifar.html
Dataset Splits No The paper mentions training on CIFAR-10 with a batch size of 128 for 600 epochs but does not specify how the dataset was split into training, validation, or test sets.
Hardware Specification Yes We train the model on a single V100 for 600 epochs with a batch size of 128, evaluating the full training gradient after each epoch.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for implementation or experimentation.
Experiment Setup Yes All runs use the default config α = 10 3, β2 = 0.999 and β1 = 0.9, and we then change one of the parameter. ... We train the model on a single V100 for 600 epochs with a batch size of 128