reproducibilityindex.ai

Communication Efficient Distributed Machine Learning with the Parameter Server

Authors: Mu Li, David G Andersen, Alexander J Smola, Kai Yu

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an in-depth analysis of two large scale machine learning problems ranging from ℓ1-regularized logistic regression on CPUs to reconstruction ICA on GPUs, using 636TB of real data with hundreds of billions of samples and dimensions. We demonstrate using these examples that the parameter server framework is an effective and straightforward way to scale machine learning to larger problems and systems than have been previously achieved.
Researcher Affiliation	Collaboration	Mu Li , David G. Andersen , Alexander Smola , and Kai Yu Carnegie Mellon University Baidu Google {muli, dga}@cs.cmu.edu, alex@smola.org, yukai@baidu.com
Pseudocode	Yes	Algorithm 1 Distributed Subgradient Descent Solving (1) in the Parameter Server
Open Source Code	Yes	Finally, the source codes are available at http://parameterserver.org.
Open Datasets	No	We collected an ad click prediction dataset with 170 billion samples and 65 billion unique features. The uncompressed dataset size is 636TB.
Dataset Splits	No	The paper mentions data partitioning for distributed processing (e.g., 'training data is partitioned and distributed among all the workers'), but it does not specify train/validation/test dataset splits, percentages, or methodology for reproducibility.
Hardware Specification	Yes	We ran the parameter server on 1000 machines, each with 16 CPU cores, 192GB DRAM, and connected by 10 Gb Ethernet.
Software Dependencies	No	The paper mentions several related systems and frameworks (e.g., Hadoop, Spark, Mahout, Graphlab) in its background, but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) required to replicate their experimental setup or run their code.
Experiment Setup	Yes	We adopted Algorithm 2 with upper bounds of the diagonal entries of the Hessian as the coordinate-speciﬁc learning rates. Features were randomly split into 580 blocks according the feature group information. We chose a ﬁxed learning rate by observing the convergence speed.