Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reducing the variance in online optimization by transporting past gradients

Authors: Sébastien Arnold, Pierre-Antoine Manzagol, Reza Babanezhad Harikandeh, Ioannis Mitliagkas, Nicolas Le Roux

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.2
Researcher Affiliation Collaboration Sébastien M. R. Arnold University of Southern California Los Angeles, CA EMAIL Pierre-Antoine Manzagol Google Brain Montréal, QC EMAIL Reza Babanezhad University of British Columbia Vancouver, BC EMAIL Ioannis Mitliagkas Mila, Université de Montréal Montréal, QC EMAIL Nicolas Le Roux Mila, Google Brain Montréal, QC EMAIL
Pseudocode Yes Algorithm 1 Heavyball-IGT
Open Source Code Yes Open-source implementation available at: https://github.com/seba-1511/igt.pth
Open Datasets Yes CIFAR10 image classification We first consider the task of training a Res Net-56 model [12] on the CIFAR-10 image classification dataset [19]. ... Image Net image classification We also consider the task of training a Res Net-50 model[12] on the larger Image Net dataset [36]. ... IMDb sentiment analysis We train a bi-directional LSTM on the IMDb Large Movie Review Dataset for 200 epochs. [27] ... Mini-Imagenet dataset [34].
Dataset Splits No The paper mentions using validation sets and reports 'validation accuracies', but does not provide specific details on the train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not specify any particular hardware used for running experiments, such as GPU models, CPU types, or cloud computing instance details.
Software Dependencies No The paper mentions 'We used TF official models code and setup [1]' but does not provide specific version numbers for TensorFlow or any other software dependencies needed to replicate the experiments.
Experiment Setup Yes We tuned the step size for each algorithm by running experiments using a logarithmic grid. ... We used a linearly decreasing stepsize as it was shown to be simple and perform well [43]. ... For each optimizer we selected the hyperparameter combination that is fastest to reach a consistently attainable target train loss [43]. ... we trained using larger minibatches (1024 instead of 128). ... We train a bi-directional LSTM on the IMDb Large Movie Review Dataset for 200 epochs. ... We replicate the 5 ways classification setup with 5 adaptation steps on tasks from the Mini-Imagenet dataset [34]. ... select the stepsize that maximizes the validation accuracy after 10K iterations, and use it to train the model for 100K iterations.