High-Performance Distributed ML at Scale through Parameter Server Consistency Models

Authors: Wei Dai, Abhimanu Kumar, Jinliang Wei, Qirong Ho, Garth Gibson, Eric Xing

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analyses and experiments show that ESSP combines the strengths of VAP and SSP: (1) ESSP achieves strong theoretical properties comparable to VAP; (2) ESSP can be efficiently implemented, with excellent empirical performance on two ML algorithms: matrix completion using SGD, and topic modeling using sampling. and Experiments We show that ESSP improves the speed and quality of convergence (versus SSP) for collapsed gibbs sampling in topic model and stochastic gradient descent (SGD) in matrix factorization.
Researcher Affiliation Academia Wei Dai, Abhimanu Kumar, Jinliang Wei, Qirong Ho*, Garth Gibson and Eric P. Xing School of Computer Science, Carnegie Mellon University *Institute for Infocomm Research, A*STAR twdai,abhimank,jinlianw,garth,epxing@cs.cmu.edu, hoqirong@gmail.com
Pseudocode No The paper describes algorithms using mathematical formulas and textual explanations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No This ESSP implementation will be made available soon as part of the Petuum project (www.petuum.org), an open-source framework for distributed Machine Learning.
Open Datasets Yes Datasets Topic model: New York Times (N = 100m tokens, V = 100k vocabularies, and K = 100 topics). Matrix factorization: Netflix dataset (480k by 18k matrix with 100m nonzeros.).
Dataset Splits No The paper mentions minibatch sizes ("50% minibatch" for LDA, "1% and 10% minibatch" for MF) but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification Yes Compute cluster Matrix factorization experiments were run on 64 nodes, each with 2 cores and 16GB RAM, connected via 1Gbps ethernet. LDA experiments were run on 8 nodes, each with 64 cores and 128GB memory, connected via 1Gbps ethernet.
Software Dependencies No The paper mentions various software components and frameworks like Parameter Server (PS), ESSPTable, Hadoop, Spark, and Graph Lab, but does not provide specific version numbers for any of them.
Experiment Setup Yes For LDA we use 50% minibatch in each Clock() call... For MF we use 1% and 10% minibatch in each Clock()... Unless stated otherwise, we use rank K = 100 and regularization parameter λ = 0.1.