Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Authors: Lukas Balles, Philipp Hennig
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 6 presents experimental results. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, T ubingen, Germany. |
| Pseudocode | Yes | Alg. 1 provides pseudo-code (ignoring the details discussed in 4.4 for readability). |
| Open Source Code | Yes | A Tensor Flow (Abadi et al., 2015) implementation can be found at https://github.com/ lballes/msvag. |
| Open Datasets | Yes | P1 A vanilla convolutional neural network (CNN) with two convolutional and two fully-connected layers on the Fashion-MNIST data set (Xiao et al., 2017). P2 A vanilla CNN with three convolutional and three fullyconnected layers on CIFAR-10 (Krizhevsky, 2009). P3 The wide residual network WRN-40-4 architecture of Zagoruyko & Komodakis (2016) on CIFAR-100. P4 A two-layer LSTM (Hochreiter & Schmidhuber, 1997) for character-level language modelling on Tolstoy s War and Peace. |
| Dataset Splits | No | No explicit mention of validation dataset splits or methodology for validation was found. The paper describes tuning based on 'maximal test accuracy' but does not specify a separate validation set for this process. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were mentioned. |
| Software Dependencies | No | Tensor Flow is mentioned as the implementation framework, but no specific version number for it or any other software dependencies is provided. |
| Experiment Setup | Yes | For all experiments, we used β = 0.9 for M-SGD, M-SSD and M-SVAG and default parameters (β1 = 0.9, β2 = 0.999, ε = 10 8) for ADAM. The global step size α was tuned for each method individually by first finding the maximal stable step size by trial and error, then searching downwards. |