reproducibilityindex.ai

Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

Authors: Lukas Balles, Philipp Hennig

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 6 presents experimental results.
Researcher Affiliation	Academia	1Max Planck Institute for Intelligent Systems, T ubingen, Germany.
Pseudocode	Yes	Alg. 1 provides pseudo-code (ignoring the details discussed in 4.4 for readability).
Open Source Code	Yes	A Tensor Flow (Abadi et al., 2015) implementation can be found at https://github.com/ lballes/msvag.
Open Datasets	Yes	P1 A vanilla convolutional neural network (CNN) with two convolutional and two fully-connected layers on the Fashion-MNIST data set (Xiao et al., 2017). P2 A vanilla CNN with three convolutional and three fullyconnected layers on CIFAR-10 (Krizhevsky, 2009). P3 The wide residual network WRN-40-4 architecture of Zagoruyko & Komodakis (2016) on CIFAR-100. P4 A two-layer LSTM (Hochreiter & Schmidhuber, 1997) for character-level language modelling on Tolstoy s War and Peace.
Dataset Splits	No	No explicit mention of validation dataset splits or methodology for validation was found. The paper describes tuning based on 'maximal test accuracy' but does not specify a separate validation set for this process.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies	No	Tensor Flow is mentioned as the implementation framework, but no specific version number for it or any other software dependencies is provided.
Experiment Setup	Yes	For all experiments, we used β = 0.9 for M-SGD, M-SSD and M-SVAG and default parameters (β1 = 0.9, β2 = 0.999, ε = 10 8) for ADAM. The global step size α was tuned for each method individually by ﬁrst ﬁnding the maximal stable step size by trial and error, then searching downwards.