Escaping Saddle Points Faster with Stochastic Momentum

Authors: Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide experimental findings that further validate these conclusions. Figure 2: Performance of SGD with different values of β = {0, 0.3, 0.5, 0.7, 0.9}; β = 0 corresponds to the standard SGD.
Researcher Affiliation Academia Jun-Kun Wang, Chi-Heng Lin, & Jacob Abernethy Georgia Institute of Technology {jimwang,cl3385,prof}@gatech.edu
Pseudocode Yes Algorithm 1: SGD with stochastic heavy ball momentum; Algorithm 2: SGD with stochastic heavy ball momentum
Open Source Code No The paper mentions popular software packages like PyTorch and TensorFlow as using their default momentum method, but it does not provide a link or explicit statement for the open-sourcing of *their own* implementation code for the described methodology.
Open Datasets No The paper defines objective functions (3) and (4) for the experiments, stating parameters like "n = 10", "n = 200, d = 10", and how data was sampled ("sampled w N(0, Id/d) and ai N(0, Id)"). However, it does not refer to or provide access to a pre-existing, publicly available dataset in the conventional sense (e.g., a specific link, DOI, or citation to a named dataset repository).
Dataset Splits No The paper does not explicitly specify dataset splits (e.g., percentages or counts for training, validation, or test sets). It describes the problem setup and initialization for its experiments but does not detail how the data (which appears to be procedurally generated based on the objective functions) is partitioned for training, validation, or testing.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions "Py Torch and Tensorflow" as popular software packages that use heavy ball momentum, but it does not specify the version numbers of these or any other software dependencies used in their experiments.
Experiment Setup Yes Figure 2: Performance of SGD with different values of β = {0, 0.3, 0.5, 0.7, 0.9}; β = 0 corresponds to the standard SGD. Fig. 4a: ... All the algorithms use the same step size η = 5 10 5. Fig. 4b: ... All the algorithms are initialized at the same point w0 N(0, Id/(10000d)) and use the same step size η = 5 10 4. Algorithm 1: Required: Step size parameter η and momentum parameter β. Algorithm 2: Required: Step size parameters r and η, momentum parameter β, and period parameter Tthred.