Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms
Authors: Qianxiao Li, Cheng Tai, Weinan E
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We observe that c SGD is robust to different initial and maximum learning rates (...) and changing network structures, while obtaining similar performance to well-tuned versions of the other methods (see also Tab. 1). In particular, notice that the best learning rates found for Adagrad and Adam generally differ for different neural networks. On the other hand, many values can be used for c SGD with little performance loss. For brevity we only show the test accuracies, but the training accuracies have similar behavior (see SM. F.5). |
| Researcher Affiliation | Academia | 1Institute of High Performance Computing, Singapore 2Peking University, Beijing, China 3Beijing Institute of Big Data Research, Beijing, China 4Princeton University, Princeton, NJ, USA. |
| Pseudocode | Yes | Algorithm 1 controlled SGD (c SGD) |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | M0: a fully connected neural network with one hidden layer and Re LU activations, trained on the MNIST dataset (Le Cun et al., 1998); C0: a fully connected neural network with two hidden layers and Tanh activations, trained on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009); |
| Dataset Splits | No | The paper mentions training and test accuracies but does not provide explicit details about the train/validation/test splits used for the datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | For M0, we perform log-uniform random search with 50 samples over intervals: c SGD: u0 [1e-2,1], η [1e-1,1]; Adagrad: η [1e-3,1]; Adam: η [1e-4,1e-1]. For C0, we perform same search over intervals: c SGD: u0 [1e-2,1], η [1e-1,1]; Adagrad: η [1e-3,1]; Adam: η [1e-6,1e-3]. We use mini-batches of size 128. |