reproducibilityindex.ai

On the Implicit Bias of Adam

Authors: Matias D. Cattaneo, Jason Matthew Klusowski, Boris Shigida

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conduct numerical experiments and discuss how the proven facts can influence generalization. and We provide numerical evidence consistent with our theoretical results by training various vision models on CIFAR10 using full-batch Adam.
Researcher Affiliation	Academia	1Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA. Correspondence to: Boris Shigida <bs1624@princeton.edu>.
Pseudocode	No	The paper provides mathematical definitions of algorithms and ODEs but does not include explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	The code used for training the models is available at https://github.com/borshigida/ implicit-bias-of-adam.
Open Datasets	Yes	We train Resnet-50, CNNs and Vision Transformers (Dosovitskiy et al., 2020) on the CIFAR-10 dataset with full-batch Adam.
Dataset Splits	No	The paper mentions training on CIFAR-10 and evaluating test accuracy, but it does not explicitly describe train/validation/test dataset splits by percentages, counts, or by referring to a standard split.
Hardware Specification	No	The paper mentions 'Princeton Research Computing resources' but does not specify any particular GPU/CPU models, processor types, or memory details.
Software Dependencies	No	The paper does not provide specific version numbers for any key software components or libraries used.
Experiment Setup	Yes	Definition 1.1. The Adam algorithm (Kingma & Ba, 2015) is an optimization algorithm with numerical stability hyperparameter ε > 0, squared gradient momentum hyperparameter ρ (0, 1), gradient momentum hyperparameter β (0, 1), initialization θ(0) Rp, ν(0) = 0 Rp, m(0) = 0 Rp and the following update rule: for each n 0, j {1, . . . , p} ... Figures 4 and 5 also specify experimental hyperparameters like 'ε = 10^-8, β = 0.99' and 'ρ = 0.999, ε = 10^-8'.