Escaping Saddle Points Faster with Stochastic Momentum
Authors: Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide experimental findings that further validate these conclusions. Figure 2: Performance of SGD with different values of β = {0, 0.3, 0.5, 0.7, 0.9}; β = 0 corresponds to the standard SGD. |
| Researcher Affiliation | Academia | Jun-Kun Wang, Chi-Heng Lin, & Jacob Abernethy Georgia Institute of Technology {jimwang,cl3385,prof}@gatech.edu |
| Pseudocode | Yes | Algorithm 1: SGD with stochastic heavy ball momentum; Algorithm 2: SGD with stochastic heavy ball momentum |
| Open Source Code | No | The paper mentions popular software packages like PyTorch and TensorFlow as using their default momentum method, but it does not provide a link or explicit statement for the open-sourcing of *their own* implementation code for the described methodology. |
| Open Datasets | No | The paper defines objective functions (3) and (4) for the experiments, stating parameters like "n = 10", "n = 200, d = 10", and how data was sampled ("sampled w N(0, Id/d) and ai N(0, Id)"). However, it does not refer to or provide access to a pre-existing, publicly available dataset in the conventional sense (e.g., a specific link, DOI, or citation to a named dataset repository). |
| Dataset Splits | No | The paper does not explicitly specify dataset splits (e.g., percentages or counts for training, validation, or test sets). It describes the problem setup and initialization for its experiments but does not detail how the data (which appears to be procedurally generated based on the objective functions) is partitioned for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions "Py Torch and Tensorflow" as popular software packages that use heavy ball momentum, but it does not specify the version numbers of these or any other software dependencies used in their experiments. |
| Experiment Setup | Yes | Figure 2: Performance of SGD with different values of β = {0, 0.3, 0.5, 0.7, 0.9}; β = 0 corresponds to the standard SGD. Fig. 4a: ... All the algorithms use the same step size η = 5 10 5. Fig. 4b: ... All the algorithms are initialized at the same point w0 N(0, Id/(10000d)) and use the same step size η = 5 10 4. Algorithm 1: Required: Step size parameter η and momentum parameter β. Algorithm 2: Required: Step size parameters r and η, momentum parameter β, and period parameter Tthred. |