Stochastic Optimization with Laggard Data Pipelines
Authors: Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate numerical experiments on convex machine learning benchmarks. This acts as a validation of our theoretical findings, as well as a way to examine beyond worst-case phenomena not captured by our minimax convergence guarantees.Figures 2 and 3 show our findings. As batch size increases, there is a phase transition from a variance-dominated regime (the $(Ω / )) term in our analysis is larger) to a bias-dominated regime (the $(Ø / )) term is larger). In the former regime, data-echoed SGD saturates on the stale data, and the optimal learning rate scales inversely with , as predicted by the theory. In the latter regime, echoing attains a nearly embarrassingly-parallel speedup, and the optimal learning rate is close to constant. |
| Researcher Affiliation | Collaboration | Naman Agarwal Google AI Princeton Princeton, NJ 08540 namanagarwal@google.com Rohan Anil Google Research Mountain View, CA 94043 rohananil@google.com Tomer Koren Tel Aviv University & Google Tel Aviv, Israel tkoren@tauex.tau.ac.il Kunal Talwar Cupertino, CA 95014 ktalwar@apple.com Cyril Zhang Microsoft Research New York, NY 10012 cyrilzhang@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Data echoing meta-algorithm" and "Algorithm 2 Data-echoing meta-algorithm (final iterate) |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code for the methodology described, nor does it provide any links to such code. |
| Open Datasets | Yes | We consider two logistic regression problems as a benchmark, the scaled Cover Type dataset from the UCI repository [19], and MNIST [30]. |
| Dataset Splits | No | The paper mentions using the Cover Type and MNIST datasets and tuning a learning rate, implying training and validation, but it does not provide specific details on the dataset splits (percentages, counts, or explicit reference to standard splits) within the main text. It states "All details can be found in the supplementary material" but this is not accessible for this analysis. |
| Hardware Specification | No | The paper discusses hardware concepts like GPUs, TPUs, FPGAs, and SSD storage in the context of performance bottlenecks in deep learning, but it does not specify the particular hardware (e.g., specific GPU models, CPU types, or cloud instances) used to conduct the experiments described in Section 5. |
| Software Dependencies | No | The paper mentions TensorFlow in a citation context related to I/O workloads but does not list any specific software dependencies, such as libraries, frameworks, or operating systems, with version numbers that were used to run its experiments. |
| Experiment Setup | Yes | For each choice of ( , ), we tune a constant learning rate by grid search, to minimize this time. |