Nesterov acceleration despite very noisy gradients

Authors: Kanan Gupta, Jonathan W. Siegel, Stephan Wojtowytsch

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Experiments
Researcher Affiliation Academia Kanan Gupta Department of Mathematics University of Pittsburgh kanan.g@pitt.edu Jonathan W. Siegel Department of Mathematics Texas A&M University jwsiegel@tamu.edu Stephan Wojtowytsch Department of Mathematics University of Pittsburgh s.woj@pitt.edu
Pseudocode Yes Algorithm 1: Accelerated Gradient descent with Noisy EStimators (AGNES)
Open Source Code Yes All the code used for the experiments in the paper has been provided in the supplementary materials.
Open Datasets Yes We trained Res Net-34 [He et al., 2016]... on the CIFAR-10 image dataset [Krizhevsky et al., 2009]... We tried various combinations of AGNES hyperparameters α and η to train Le Net-5 on the MNIST dataset
Dataset Splits No The resulting dataset was split into 90% training and 10% testing data. The paper specifies training and testing splits but does not mention a separate validation split.
Hardware Specification No The experiments in sections 5.3 and 5.4 were run on a single current generation GPU in a local cluster for up to 50 hours. This work used the H2P cluster, which is supported by NSF award number OAC-2117681. While it mentions "single current generation GPU" and "H2P cluster", it does not specify exact GPU/CPU models or detailed specifications.
Software Dependencies No All neural-network based experiments were performed using the Py Torch library. The paper mentions PyTorch but does not specify a version number.
Experiment Setup Yes We selected the learning rate 10 3 for Adam... For AGNES, NAG, and SGD, based on initial exploratory experiments, we used a learning rate of 10 4, a momentum value of 0.99, and for AGNES, a correction step size η = 10 3. We used the same initial learning rate 10 3 for all the algorithms, which was dropped to 10 4 after 25 epochs. A momentum value of 0.99 was used for SGD, NAG, and AGNES and a constant correction step size η = 10 2 was used for AGNES.