reproducibilityindex.ai

Distributed Stochastic Optimization via Adaptive SGD

Authors: Ashok Cutkosky, Róbert Busa-Fekete

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement our algorithm in the Spark distributed framework and exhibit dramatic performance gains on large-scale logistic regression problems. and To verify our theoretical results, we carried out experiments on large-scale (order 100 million datapoints) public datasets...
Researcher Affiliation	Collaboration	Ashok Cutkosky Stanford University, USA cutkosky@google.com and Róbert Busa-Fekete Yahoo! Research, New York, USA busafekete@oath.com and now at Google
Pseudocode	Yes	Algorithm 1 SVRG OL (SVRG with Online Learning)
Open Source Code	No	No explicit statement providing access to the source code for the methodology described in this paper was found.
Open Datasets	Yes	To verify our theoretical results, we carried out experiments on large-scale (order 100 million datapoints) public datasets, such as KDD10 and KDD12 3 and 3https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
Dataset Splits	Yes	The main statistics of the datasets are shown in Table 2. and We measure the number of communication rounds, the total training error, the error on a held-out test set, the Area Under the Curve (AUC), and total runtime in minutes.
Hardware Specification	No	No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided. The paper only mentions 'Spark distributed framework'.
Software Dependencies	Yes	We tested two well-know scalable logistic regression implementation: Spark ML 2.2.0 and Vowpal Wabbit 7.10.0 (VW)
Experiment Setup	Yes	Our theoretical analysis asks for exponentially increasing serial phase lengths Tk and a batch size of of ˆN = T 2. In practice we use slightly different settings. We have a constant serial phase length Tk = T0 for all k, and an increasing batch size ˆNk = k C for some constant C. We usually set C = T0. and We initially divide the training data into C approximately 100M chunks, and we use min(1000, C) executors.