Distributed Balanced Clustering via Mapping Coresets
Authors: Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In order to gauge its practicality, we implement our algorithm. We are interested in measuring its scalability in addition to the effect of having several rounds on the quality of the solution. Our experiments deal with two instances to test this effect: the larger instance is the world graph with hundreds of millions of nodes, and the smaller one is the graph of US road networks with tens of millions of nodes. Table 2 shows that the quality of the solution does not degrade substantially if we use the tworound algorithm, more suited to parallel implementation. Figure 2 shows the increase in running time is sublinear. |
| Researcher Affiliation | Industry | Mohammad Hossein Bateni Google NYC bateni@google.com Aditya Bhaskara Google NYC bhaskaraaditya@google.com Silvio Lattanzi Google NYC silviol@google.com Vahab Mirrokni Google NYC mirrokni@google.com |
| Pseudocode | No | The paper describes algorithmic steps in prose (e.g., in Section 3.1 and 3.2) but does not provide structured pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not contain any statements about making its source code publicly available or providing a link to a code repository. |
| Open Datasets | No | The paper mentions 'world graph' and 'US road networks' as instances for experiments, but does not provide concrete access information (e.g., a specific URL, DOI, or a formal citation to a dataset paper with authors and year) for these datasets. |
| Dataset Splits | Yes | In case of the US Graph, the sequential algorithm was run on a random 1 300 subset of the actual graph, whereas a random 1 1000 subset was used for the World Graph. We focus on the bigger instance (World Graph) and once again take its random samples of different sizes (10% up to 100%). |
| Hardware Specification | No | The paper states 'run our parallel algorithms on a few hundred machines' but does not provide any specific details about the CPU, GPU, or memory of these machines. |
| Software Dependencies | No | The paper describes algorithms and experiments but does not list any specific software or library names with version numbers (e.g., Python, TensorFlow, PyTorch, CPLEX). |
| Experiment Setup | Yes | We always look for 1000 clusters, and run our parallel algorithms on a few hundred machines. In case of the US Graph, the sequential algorithm was run on a random 1 300 subset of the actual graph, whereas a random 1 1000 subset was used for the World Graph. |