reproducibilityindex.ai

Cyclades: Conflict-free Asynchronous Machine Learning

Authors: Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Christopher Ré

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented CYCLADES in C++ and tested it on a variety of problems, and a number of stochastic updates algorithms, and compared against their HOGWILD! (i.e., asynchronous, lock-free) implementations.
Researcher Affiliation	Academia	Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA. Department of Computer Science, Stanford University, Palo Alto, CA. Department of Statistics, UC Berkeley, Berkeley, CA.
Pseudocode	Yes	Algorithm 1 Stochastic Updates
Open Source Code	Yes	Code is available at https://github.com/amplab/cyclades.
Open Datasets	Yes	Dataset # datapoints # features av. sparsity / datapoint Comments NH2010 48,838 48,838 4.8026 Topological graph DBLP 5,425,964 5,425,964 3.1880 Authorship network Movie Lens 10M 82,250 200 10M movie ratings EN-Wiki 20,207,156 213,272 200 Subset of english Wikipedia dump. Table 1: Details of datasets used in our experiments.
Dataset Splits	No	The paper uses various datasets and mentions 'one random data reshufﬂing across all epochs' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproducibility.
Hardware Specification	Yes	Our experiments were conducted on a machine with 72 CPUs (Intel(R) Xeon(R) CPU E7-8870 v3, 2.10 GHz) on 4 NUMA nodes, each with 18 CPUs, and 1TB of memory.
Software Dependencies	No	The paper states 'We implemented CYCLADES in C++' but does not specify versions for C++ compilers, libraries, or any other software dependencies crucial for reproduction.
Experiment Setup	Yes	We tune our constant stepsizes so to maximize convergence without diverging, and use one random data reshufﬂing across all epochs. Batch sizes are picked to optimize performance for CYCLADES.