SGD Can Converge to Local Maxima
Authors: Liu Ziyin, Botao Li, James B Simon, Masahito Ueda
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also realize results in a minimal neural network-like example. In Sec. 6, we present the numerical simulations, including a minimal example involving a neural network. |
| Researcher Affiliation | Academia | 1The University of Tokyo 2ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris 3University of California, Berkeley |
| Pseudocode | No | The paper defines algorithms (e.g., SGD and AMSGrad) using mathematical equations (e.g., |
| Open Source Code | No | The paper does not provide any statement about releasing source code or links to a code repository. |
| Open Datasets | No | The paper uses |
| Dataset Splits | No | The paper does not specify explicit training/validation/test dataset splits. For the toy neural network example, it mentions |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments (e.g., specific GPU/CPU models, memory, or cloud computing instances). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In this numerical example, we set λ = 0.8 and a = −1... we set λ = 0.2 and β2 = 0.999 for both Adam and AMSGrad. When momentum is used, we set β1 = 0.9. GD is run with a learning rate of 0.01. ...w1 is initialized uniformly in [−1,1]; w2 is initialized uniformly in [0,1]... at a small learning rate (λ = 0.001)... when the learning rate is large (λ = 0.1). |