Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SGD Can Converge to Local Maxima
Authors: Liu Ziyin, Botao Li, James B Simon, Masahito Ueda
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also realize results in a minimal neural network-like example. In Sec. 6, we present the numerical simulations, including a minimal example involving a neural network. |
| Researcher Affiliation | Academia | 1The University of Tokyo 2ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris 3University of California, Berkeley |
| Pseudocode | No | The paper defines algorithms (e.g., SGD and AMSGrad) using mathematical equations (e.g., |
| Open Source Code | No | The paper does not provide any statement about releasing source code or links to a code repository. |
| Open Datasets | No | The paper uses |
| Dataset Splits | No | The paper does not specify explicit training/validation/test dataset splits. For the toy neural network example, it mentions |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments (e.g., specific GPU/CPU models, memory, or cloud computing instances). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In this numerical example, we set λ = 0.8 and a = −1... we set λ = 0.2 and β2 = 0.999 for both Adam and AMSGrad. When momentum is used, we set β1 = 0.9. GD is run with a learning rate of 0.01. ...w1 is initialized uniformly in [−1,1]; w2 is initialized uniformly in [0,1]... at a small learning rate (λ = 0.001)... when the learning rate is large (λ = 0.1). |