Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Model Parallelization and Scheduling Strategies for Distributed Machine Learning

Authors: Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A Gibson, Eric P Xing

NeurIPS 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of model-parallel algorithms implemented on STRADS versus popular implementations for topic modeling, matrix factorization, and Lasso. We conducted experiments on two clusters...
Researcher Affiliation	Academia	School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 seunghak@, jinkyuk@, xunzheng@, garth@, EMAIL Institute for Infocomm Research A*STAR Singapore 138632 EMAIL
Pseudocode	Yes	Figure 2: STRADS interface: Basic functional signatures of schedule, push, pull, using pseudocode. Figure 3: STRADS LDA pseudocode. Figure 5: STRADS MF pseudocode. Figure 6: STRADS Lasso pseudocode.
Open Source Code	No	The paper does not provide a statement or link indicating that its own source code is open or publicly available. It mentions using third-party tools like Open MPI.
Open Datasets	Yes	We used 3.9M English Wikipedia abstracts, and conducted experiments using both unigram (1-word) tokens (V = 2.5M unique unigrams, 179M tokens) and bigram (2-word) tokens [16] (V = 21.8M unique bigrams, 79M tokens). ... We used the Nexﬂix dataset [2] for our MF experiments: 100M anonymized ratings from 480,189 users on 17,770 movies.
Dataset Splits	No	The paper describes the datasets used and the experimental setup for convergence and scalability, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or exact counts for each split).
Hardware Specification	Yes	The 2-core cluster contains 128 machines, each with two 2.6GHz AMD cores and 8GB RAM, and connected via a 1Gbps network interface. The 16-core cluster contains 9 machines, each with 16 2.1GHz AMD cores and 64GB RAM, and connected via a 40Gbps network interface.
Software Dependencies	Yes	We implemented STRADS using C++ and the Boost libraries, and Open MPI 1.4.5 was used for asynchronous communication between the master schedulers, workers, and key-value stores.
Experiment Setup	Yes	for Lasso, we set λ = 0.001, and for MF, we set λ = 0.05. ... We set the number of topics to K = 5000 and 10000 (also larger than recent literature [1]). ... We varied the rank of W, H from K = 20 to 2000, which exceeds the upper limit of previous MF papers [26, 10, 24].