Distributed Weighted Matching via Randomized Composable Coresets
Authors: Sepehr Assadi, Mohammadhossein Bateni, Vahab Mirrokni
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we develop a simple distributed algorithm for the problem on general graphs with approximation guarantee of 2+ϵ that (nearly) matches that of the sequential greedy algorithm. We also demonstrate the efficiency of our algorithm in practice on various graphs (some with half a trillion edges) by achieving objective values always close to what is achievable in the centralized setting. In this section, we report the results of evaluating our algorithm on a number of publicly available datasets. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Princeton University, Princeton, NJ, US. Supported in part by the Simons Collaboration on Algorithms and Geometry. Majority of the work done while this author was a summer intern at Google Research, New York. 2Google Research, New York, NY, US. |
| Pseudocode | No | The paper describes algorithms but does not provide structured pseudocode blocks or figures labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide a direct link to open-source code for the described methodology or state that their code is publicly available. |
| Open Datasets | Yes | We use different datasets with varying sizes, ranging from about six million to half a trillion edges. Three of our datasets (namely Friendster, Orkut, and Live Journal, all taken from SNAP) are the public datasets used in evaluating a hierarchical clustering algorithm in (Bateni et al., 2017). We also add a fourth dataset of our own based on public DBLP coauthorship (Demaine & Hajiaghayi, 2014). |
| Dataset Splits | No | The paper describes the datasets used but does not specify explicit training, validation, and test splits (e.g., percentages or counts) or reference predefined splits for reproducibility. |
| Hardware Specification | No | We implement our algorithm on a real proprietary Map Reduce-like cluster, i.e., no simulations, with tens to hundreds of machines, depending on the size of the dataset. The actual running times and number of machines are withheld due to company’s policy so as to not reveal specifics of hardware and architecture in our company’s clusters. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | No | The paper describes the algorithm's implementation details (e.g., running greedy as post-processing) but does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, epochs), model initialization, or other training configurations. |