Nesterov acceleration despite very noisy gradients
Authors: Kanan Gupta, Jonathan W. Siegel, Stephan Wojtowytsch
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiments |
| Researcher Affiliation | Academia | Kanan Gupta Department of Mathematics University of Pittsburgh kanan.g@pitt.edu Jonathan W. Siegel Department of Mathematics Texas A&M University jwsiegel@tamu.edu Stephan Wojtowytsch Department of Mathematics University of Pittsburgh s.woj@pitt.edu |
| Pseudocode | Yes | Algorithm 1: Accelerated Gradient descent with Noisy EStimators (AGNES) |
| Open Source Code | Yes | All the code used for the experiments in the paper has been provided in the supplementary materials. |
| Open Datasets | Yes | We trained Res Net-34 [He et al., 2016]... on the CIFAR-10 image dataset [Krizhevsky et al., 2009]... We tried various combinations of AGNES hyperparameters α and η to train Le Net-5 on the MNIST dataset |
| Dataset Splits | No | The resulting dataset was split into 90% training and 10% testing data. The paper specifies training and testing splits but does not mention a separate validation split. |
| Hardware Specification | No | The experiments in sections 5.3 and 5.4 were run on a single current generation GPU in a local cluster for up to 50 hours. This work used the H2P cluster, which is supported by NSF award number OAC-2117681. While it mentions "single current generation GPU" and "H2P cluster", it does not specify exact GPU/CPU models or detailed specifications. |
| Software Dependencies | No | All neural-network based experiments were performed using the Py Torch library. The paper mentions PyTorch but does not specify a version number. |
| Experiment Setup | Yes | We selected the learning rate 10 3 for Adam... For AGNES, NAG, and SGD, based on initial exploratory experiments, we used a learning rate of 10 4, a momentum value of 0.99, and for AGNES, a correction step size η = 10 3. We used the same initial learning rate 10 3 for all the algorithms, which was dropped to 10 4 after 25 epochs. A momentum value of 0.99 was used for SGD, NAG, and AGNES and a constant correction step size η = 10 2 was used for AGNES. |