Cyclades: Conflict-free Asynchronous Machine Learning
Authors: Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Christopher Ré
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented CYCLADES in C++ and tested it on a variety of problems, and a number of stochastic updates algorithms, and compared against their HOGWILD! (i.e., asynchronous, lock-free) implementations. |
| Researcher Affiliation | Academia | Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA. Department of Computer Science, Stanford University, Palo Alto, CA. Department of Statistics, UC Berkeley, Berkeley, CA. |
| Pseudocode | Yes | Algorithm 1 Stochastic Updates |
| Open Source Code | Yes | Code is available at https://github.com/amplab/cyclades. |
| Open Datasets | Yes | Dataset # datapoints # features av. sparsity / datapoint Comments NH2010 48,838 48,838 4.8026 Topological graph DBLP 5,425,964 5,425,964 3.1880 Authorship network Movie Lens 10M 82,250 200 10M movie ratings EN-Wiki 20,207,156 213,272 200 Subset of english Wikipedia dump. Table 1: Details of datasets used in our experiments. |
| Dataset Splits | No | The paper uses various datasets and mentions 'one random data reshuffling across all epochs' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproducibility. |
| Hardware Specification | Yes | Our experiments were conducted on a machine with 72 CPUs (Intel(R) Xeon(R) CPU E7-8870 v3, 2.10 GHz) on 4 NUMA nodes, each with 18 CPUs, and 1TB of memory. |
| Software Dependencies | No | The paper states 'We implemented CYCLADES in C++' but does not specify versions for C++ compilers, libraries, or any other software dependencies crucial for reproduction. |
| Experiment Setup | Yes | We tune our constant stepsizes so to maximize convergence without diverging, and use one random data reshuffling across all epochs. Batch sizes are picked to optimize performance for CYCLADES. |