reproducibilityindex.ai

Stochastic Optimization with Laggard Data Pipelines

Authors: Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate numerical experiments on convex machine learning benchmarks. This acts as a validation of our theoretical findings, as well as a way to examine beyond worst-case phenomena not captured by our minimax convergence guarantees.Figures 2 and 3 show our findings. As batch size increases, there is a phase transition from a variance-dominated regime (the $(Ω / )) term in our analysis is larger) to a bias-dominated regime (the $(Ø / )) term is larger). In the former regime, data-echoed SGD saturates on the stale data, and the optimal learning rate scales inversely with , as predicted by the theory. In the latter regime, echoing attains a nearly embarrassingly-parallel speedup, and the optimal learning rate is close to constant.
Researcher Affiliation	Collaboration	Naman Agarwal Google AI Princeton Princeton, NJ 08540 namanagarwal@google.com Rohan Anil Google Research Mountain View, CA 94043 rohananil@google.com Tomer Koren Tel Aviv University & Google Tel Aviv, Israel tkoren@tauex.tau.ac.il Kunal Talwar Cupertino, CA 95014 ktalwar@apple.com Cyril Zhang Microsoft Research New York, NY 10012 cyrilzhang@microsoft.com
Pseudocode	Yes	Algorithm 1 Data echoing meta-algorithm" and "Algorithm 2 Data-echoing meta-algorithm (ﬁnal iterate)
Open Source Code	No	The paper does not contain any statement about releasing open-source code for the methodology described, nor does it provide any links to such code.
Open Datasets	Yes	We consider two logistic regression problems as a benchmark, the scaled Cover Type dataset from the UCI repository [19], and MNIST [30].
Dataset Splits	No	The paper mentions using the Cover Type and MNIST datasets and tuning a learning rate, implying training and validation, but it does not provide specific details on the dataset splits (percentages, counts, or explicit reference to standard splits) within the main text. It states "All details can be found in the supplementary material" but this is not accessible for this analysis.
Hardware Specification	No	The paper discusses hardware concepts like GPUs, TPUs, FPGAs, and SSD storage in the context of performance bottlenecks in deep learning, but it does not specify the particular hardware (e.g., specific GPU models, CPU types, or cloud instances) used to conduct the experiments described in Section 5.
Software Dependencies	No	The paper mentions TensorFlow in a citation context related to I/O workloads but does not list any specific software dependencies, such as libraries, frameworks, or operating systems, with version numbers that were used to run its experiments.
Experiment Setup	Yes	For each choice of ( , ), we tune a constant learning rate by grid search, to minimize this time.