reproducibilityindex.ai

Reducing the variance in online optimization by transporting past gradients

Authors: Sébastien Arnold, Pierre-Antoine Manzagol, Reza Babanezhad Harikandeh, Ioannis Mitliagkas, Nicolas Le Roux

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.2
Researcher Affiliation	Collaboration	Sébastien M. R. Arnold University of Southern California Los Angeles, CA seb.arnold@usc.edu Pierre-Antoine Manzagol Google Brain Montréal, QC manzagop@google.com Reza Babanezhad University of British Columbia Vancouver, BC rezababa@cs.ubc.ca Ioannis Mitliagkas Mila, Université de Montréal Montréal, QC ioannis@iro.umontreal.ca Nicolas Le Roux Mila, Google Brain Montréal, QC nlr@google.com
Pseudocode	Yes	Algorithm 1 Heavyball-IGT
Open Source Code	Yes	Open-source implementation available at: https://github.com/seba-1511/igt.pth
Open Datasets	Yes	CIFAR10 image classiﬁcation We ﬁrst consider the task of training a Res Net-56 model [12] on the CIFAR-10 image classiﬁcation dataset [19]. ... Image Net image classiﬁcation We also consider the task of training a Res Net-50 model[12] on the larger Image Net dataset [36]. ... IMDb sentiment analysis We train a bi-directional LSTM on the IMDb Large Movie Review Dataset for 200 epochs. [27] ... Mini-Imagenet dataset [34].
Dataset Splits	No	The paper mentions using validation sets and reports 'validation accuracies', but does not provide specific details on the train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not specify any particular hardware used for running experiments, such as GPU models, CPU types, or cloud computing instance details.
Software Dependencies	No	The paper mentions 'We used TF ofﬁcial models code and setup [1]' but does not provide specific version numbers for TensorFlow or any other software dependencies needed to replicate the experiments.
Experiment Setup	Yes	We tuned the step size for each algorithm by running experiments using a logarithmic grid. ... We used a linearly decreasing stepsize as it was shown to be simple and perform well [43]. ... For each optimizer we selected the hyperparameter combination that is fastest to reach a consistently attainable target train loss [43]. ... we trained using larger minibatches (1024 instead of 128). ... We train a bi-directional LSTM on the IMDb Large Movie Review Dataset for 200 epochs. ... We replicate the 5 ways classiﬁcation setup with 5 adaptation steps on tasks from the Mini-Imagenet dataset [34]. ... select the stepsize that maximizes the validation accuracy after 10K iterations, and use it to train the model for 100K iterations.