reproducibilityindex.ai

Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms

Authors: Qianxiao Li, Cheng Tai, Weinan E

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We observe that c SGD is robust to different initial and maximum learning rates (...) and changing network structures, while obtaining similar performance to well-tuned versions of the other methods (see also Tab. 1). In particular, notice that the best learning rates found for Adagrad and Adam generally differ for different neural networks. On the other hand, many values can be used for c SGD with little performance loss. For brevity we only show the test accuracies, but the training accuracies have similar behavior (see SM. F.5).
Researcher Affiliation	Academia	1Institute of High Performance Computing, Singapore 2Peking University, Beijing, China 3Beijing Institute of Big Data Research, Beijing, China 4Princeton University, Princeton, NJ, USA.
Pseudocode	Yes	Algorithm 1 controlled SGD (c SGD)
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	M0: a fully connected neural network with one hidden layer and Re LU activations, trained on the MNIST dataset (Le Cun et al., 1998); C0: a fully connected neural network with two hidden layers and Tanh activations, trained on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009);
Dataset Splits	No	The paper mentions training and test accuracies but does not provide explicit details about the train/validation/test splits used for the datasets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	For M0, we perform log-uniform random search with 50 samples over intervals: c SGD: u0 [1e-2,1], η [1e-1,1]; Adagrad: η [1e-3,1]; Adam: η [1e-4,1e-1]. For C0, we perform same search over intervals: c SGD: u0 [1e-2,1], η [1e-1,1]; Adagrad: η [1e-3,1]; Adam: η [1e-6,1e-3]. We use mini-batches of size 128.