Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

Authors: Lukas Balles, Philipp Hennig

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 6 presents experimental results.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems, T ubingen, Germany.
Pseudocode Yes Alg. 1 provides pseudo-code (ignoring the details discussed in 4.4 for readability).
Open Source Code Yes A Tensor Flow (Abadi et al., 2015) implementation can be found at https://github.com/ lballes/msvag.
Open Datasets Yes P1 A vanilla convolutional neural network (CNN) with two convolutional and two fully-connected layers on the Fashion-MNIST data set (Xiao et al., 2017). P2 A vanilla CNN with three convolutional and three fullyconnected layers on CIFAR-10 (Krizhevsky, 2009). P3 The wide residual network WRN-40-4 architecture of Zagoruyko & Komodakis (2016) on CIFAR-100. P4 A two-layer LSTM (Hochreiter & Schmidhuber, 1997) for character-level language modelling on Tolstoy s War and Peace.
Dataset Splits No No explicit mention of validation dataset splits or methodology for validation was found. The paper describes tuning based on 'maximal test accuracy' but does not specify a separate validation set for this process.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies No Tensor Flow is mentioned as the implementation framework, but no specific version number for it or any other software dependencies is provided.
Experiment Setup Yes For all experiments, we used β = 0.9 for M-SGD, M-SSD and M-SVAG and default parameters (β1 = 0.9, β2 = 0.999, ε = 10 8) for ADAM. The global step size α was tuned for each method individually by first finding the maximal stable step size by trial and error, then searching downwards.