reproducibilityindex.ai

Distributed Balanced Clustering via Mapping Coresets

Authors: Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In order to gauge its practicality, we implement our algorithm. We are interested in measuring its scalability in addition to the effect of having several rounds on the quality of the solution. Our experiments deal with two instances to test this effect: the larger instance is the world graph with hundreds of millions of nodes, and the smaller one is the graph of US road networks with tens of millions of nodes. Table 2 shows that the quality of the solution does not degrade substantially if we use the tworound algorithm, more suited to parallel implementation. Figure 2 shows the increase in running time is sublinear.
Researcher Affiliation	Industry	Mohammad Hossein Bateni Google NYC bateni@google.com Aditya Bhaskara Google NYC bhaskaraaditya@google.com Silvio Lattanzi Google NYC silviol@google.com Vahab Mirrokni Google NYC mirrokni@google.com
Pseudocode	No	The paper describes algorithmic steps in prose (e.g., in Section 3.1 and 3.2) but does not provide structured pseudocode blocks or algorithms labeled as such.
Open Source Code	No	The paper does not contain any statements about making its source code publicly available or providing a link to a code repository.
Open Datasets	No	The paper mentions 'world graph' and 'US road networks' as instances for experiments, but does not provide concrete access information (e.g., a specific URL, DOI, or a formal citation to a dataset paper with authors and year) for these datasets.
Dataset Splits	Yes	In case of the US Graph, the sequential algorithm was run on a random 1 300 subset of the actual graph, whereas a random 1 1000 subset was used for the World Graph. We focus on the bigger instance (World Graph) and once again take its random samples of different sizes (10% up to 100%).
Hardware Specification	No	The paper states 'run our parallel algorithms on a few hundred machines' but does not provide any specific details about the CPU, GPU, or memory of these machines.
Software Dependencies	No	The paper describes algorithms and experiments but does not list any specific software or library names with version numbers (e.g., Python, TensorFlow, PyTorch, CPLEX).
Experiment Setup	Yes	We always look for 1000 clusters, and run our parallel algorithms on a few hundred machines. In case of the US Graph, the sequential algorithm was run on a random 1 300 subset of the actual graph, whereas a random 1 1000 subset was used for the World Graph.