Adam: A Method for Stochastic Optimization

Authors: Jimmy Ba and Diederik Kingma

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. and To empirically evaluate the proposed method, we investigated different popular machine learning models, including logistic regression, multilayer fully connected neural networks and deep convolutional neural networks.
Researcher Affiliation Collaboration Diederik P. Kingma* University of Amsterdam, Open AI dpkingma@openai.com Jimmy Lei Ba University of Toronto jimmy@psi.utoronto.ca
Pseudocode Yes Algorithm 1: Adam, our proposed algorithm for stochastic optimization.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We evaluate our proposed method on L2-regularized multi-class logistic regression using the MNIST dataset. and We examine the sparse feature problem using IMDB movie review dataset from (Maas et al., 2011). and Our CNN architecture... CIFAR-10 with c64-c64-c128-1000 architecture. and We vary the β1 and β2 when training a variational autoencoder (VAE) with the same architecture as in (Kingma & Welling, 2013).
Dataset Splits No The paper mentions searching hyperparameters over a dense grid, which implies a validation process, but it does not provide specific dataset split percentages, sample counts, or explicit methodology for creating training/validation/test splits.
Hardware Specification No The paper mentions that 'Experiments in this work were partly carried out on the Dutch national e-infrastructure with the support of SURF Foundation' but does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or scikit-learn with their versions) that would be needed to replicate the experiments.
Experiment Setup Yes Good default settings for the tested machine learning problems are α = 0.001, β1 = 0.9, β2 = 0.999 and ϵ = 10 8. and minibatch size of 128. and The hyper-parameters, such as learning rate and momentum, are searched over a dense grid and the results are reported using the best hyper-parameter setting.