Cyclades: Conflict-free Asynchronous Machine Learning

Authors: Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Christopher Ré

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented CYCLADES in C++ and tested it on a variety of problems, and a number of stochastic updates algorithms, and compared against their HOGWILD! (i.e., asynchronous, lock-free) implementations.
Researcher Affiliation Academia Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA. Department of Computer Science, Stanford University, Palo Alto, CA. Department of Statistics, UC Berkeley, Berkeley, CA.
Pseudocode Yes Algorithm 1 Stochastic Updates
Open Source Code Yes Code is available at https://github.com/amplab/cyclades.
Open Datasets Yes Dataset # datapoints # features av. sparsity / datapoint Comments NH2010 48,838 48,838 4.8026 Topological graph DBLP 5,425,964 5,425,964 3.1880 Authorship network Movie Lens 10M 82,250 200 10M movie ratings EN-Wiki 20,207,156 213,272 200 Subset of english Wikipedia dump. Table 1: Details of datasets used in our experiments.
Dataset Splits No The paper uses various datasets and mentions 'one random data reshuffling across all epochs' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproducibility.
Hardware Specification Yes Our experiments were conducted on a machine with 72 CPUs (Intel(R) Xeon(R) CPU E7-8870 v3, 2.10 GHz) on 4 NUMA nodes, each with 18 CPUs, and 1TB of memory.
Software Dependencies No The paper states 'We implemented CYCLADES in C++' but does not specify versions for C++ compilers, libraries, or any other software dependencies crucial for reproduction.
Experiment Setup Yes We tune our constant stepsizes so to maximize convergence without diverging, and use one random data reshuffling across all epochs. Batch sizes are picked to optimize performance for CYCLADES.