Distributed Stochastic Optimization via Adaptive SGD
Authors: Ashok Cutkosky, Róbert Busa-Fekete
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our algorithm in the Spark distributed framework and exhibit dramatic performance gains on large-scale logistic regression problems. and To verify our theoretical results, we carried out experiments on large-scale (order 100 million datapoints) public datasets... |
| Researcher Affiliation | Collaboration | Ashok Cutkosky Stanford University, USA cutkosky@google.com and Róbert Busa-Fekete Yahoo! Research, New York, USA busafekete@oath.com and now at Google |
| Pseudocode | Yes | Algorithm 1 SVRG OL (SVRG with Online Learning) |
| Open Source Code | No | No explicit statement providing access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | To verify our theoretical results, we carried out experiments on large-scale (order 100 million datapoints) public datasets, such as KDD10 and KDD12 3 and 3https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html |
| Dataset Splits | Yes | The main statistics of the datasets are shown in Table 2. and We measure the number of communication rounds, the total training error, the error on a held-out test set, the Area Under the Curve (AUC), and total runtime in minutes. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided. The paper only mentions 'Spark distributed framework'. |
| Software Dependencies | Yes | We tested two well-know scalable logistic regression implementation: Spark ML 2.2.0 and Vowpal Wabbit 7.10.0 (VW) |
| Experiment Setup | Yes | Our theoretical analysis asks for exponentially increasing serial phase lengths Tk and a batch size of of ˆN = T 2. In practice we use slightly different settings. We have a constant serial phase length Tk = T0 for all k, and an increasing batch size ˆNk = k C for some constant C. We usually set C = T0. and We initially divide the training data into C approximately 100M chunks, and we use min(1000, C) executors. |