reproducibilityindex.ai

Distributed Weighted Matching via Randomized Composable Coresets

Authors: Sepehr Assadi, Mohammadhossein Bateni, Vahab Mirrokni

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we develop a simple distributed algorithm for the problem on general graphs with approximation guarantee of 2+ϵ that (nearly) matches that of the sequential greedy algorithm. We also demonstrate the efﬁciency of our algorithm in practice on various graphs (some with half a trillion edges) by achieving objective values always close to what is achievable in the centralized setting. In this section, we report the results of evaluating our algorithm on a number of publicly available datasets.
Researcher Affiliation	Collaboration	1Department of Computer Science, Princeton University, Princeton, NJ, US. Supported in part by the Simons Collaboration on Algorithms and Geometry. Majority of the work done while this author was a summer intern at Google Research, New York. 2Google Research, New York, NY, US.
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode blocks or figures labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	No	The paper does not provide a direct link to open-source code for the described methodology or state that their code is publicly available.
Open Datasets	Yes	We use different datasets with varying sizes, ranging from about six million to half a trillion edges. Three of our datasets (namely Friendster, Orkut, and Live Journal, all taken from SNAP) are the public datasets used in evaluating a hierarchical clustering algorithm in (Bateni et al., 2017). We also add a fourth dataset of our own based on public DBLP coauthorship (Demaine & Hajiaghayi, 2014).
Dataset Splits	No	The paper describes the datasets used but does not specify explicit training, validation, and test splits (e.g., percentages or counts) or reference predefined splits for reproducibility.
Hardware Specification	No	We implement our algorithm on a real proprietary Map Reduce-like cluster, i.e., no simulations, with tens to hundreds of machines, depending on the size of the dataset. The actual running times and number of machines are withheld due to company’s policy so as to not reveal speciﬁcs of hardware and architecture in our company’s clusters.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup	No	The paper describes the algorithm's implementation details (e.g., running greedy as post-processing) but does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, epochs), model initialization, or other training configurations.