reproducibilityindex.ai

Aggregated Momentum: Stability Through Passive Damping

Authors: James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate Agg Mo empirically we compare against other commonly used optimizers on a range of deep learning architectures: deep autoencoders, convolutional networks, and long-term short-term memory (LSTM).
Researcher Affiliation	Academia	James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse University of Toronto; Vector Institute {jlucas, ssy, zemel, rgrosse}@cs.toronto.edu
Pseudocode	No	The paper describes the Agg Mo update rule using mathematical equations (Equation 3) but does not provide a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not provide a link to open-source code or explicitly state that the code is publicly available.
Open Datasets	Yes	To do so we used four datasets: MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009) and Penn Treebank (Marcus et al., 1993).
Dataset Splits	Yes	For these experiments the training set consists of 90% of the training data with the remaining 10% being used for validation.
Hardware Specification	No	The paper mentions that experiments were conducted using the PyTorch library but does not specify any hardware details like GPU or CPU models.
Software Dependencies	No	All of our experiments are conducted using the pytorch library Paszke et al. (2017). The paper mentions PyTorch but does not provide a specific version number for it or other software dependencies.
Experiment Setup	Yes	For CM and Nesterov we evaluated damping coefﬁcients in the range: {0.0, 0.9, 0.99, 0.999}. For Adam, it is standard to use β1 = 0.9 and β2 = 0.999. Since β1 is analogous to the momentum damping parameter, we considered β1 {0.9, 0.99, 0.999} and kept β2 = 0.999. For Agg Mo, we explored K in { 2,3,4 }. Each model was trained for 1000 epochs. ... We train for a total of 1000 epochs using a multiplicative learning rate decay of 0.1 at 200, 400, and 800 epochs. We train using batch sizes of 200.