Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Community Recovery in the Geometric Block Model

Authors: Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, Barna Saha

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm. Keywords: Random graphs, Community recovery, Generative model, Graph clustering, Random geometric graphs. (...) In addition to validation experiments in Section 1.1, we also conducted an in-depth experimentation of our proposed model and techniques over a set of synthetic and real world networks. (...) Section 7. Experimental Results
Researcher Affiliation Collaboration Sainyam Galhotra EMAIL Department of Computer Science Cornell University (...) Arya Mazumdar EMAIL Halicioglu Data Science Institute University of California, San Diego (...) Soumyabrata Pal EMAIL Google Research Bengaluru (...) Barna Saha EMAIL Department of Computer Science and Halicioglu Data Science Institute University of California, San Diego
Pseudocode Yes Algorithm 1: Cluster recovery in GBM1 Require: GBM1 G = (V, E), rs, rd (...) Algorithm 2: process Require: u,v, rs, rd Ensure: true/false
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the methodology described.
Open Datasets Yes AMAZON METADATA The first dataset that we use in our experiments is the Amazon product metadata on SNAP (https: //snap.stanford.edu/data/amazon-meta.html) (...) Political Blogs. (Adamic and Glance, 2005) (...) DBLP. (Yang and Leskovec, 2015) (...) Live Journal. (Leskovec et al., 2007)
Dataset Splits No The paper mentions extracting communities of specific sizes for DBLP and Live Journal (e.g., "top two communities of size 4500 and 7500 respectively" for DBLP; "top two clusters of sizes 930 and 1400" for Live Journal), and describes a processing approach using T1, T2, T3 thresholds to sample subgraphs. However, it does not provide explicit training/test/validation splits with percentages, sample counts, or references to predefined standard splits for model evaluation.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments. No specific GPU models, CPU models, or detailed computer specifications are mentioned.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, scikit-learn versions) needed to replicate the experiment.
Experiment Setup No The paper describes its experimental setting by using thresholds T1, T2, and T3 for subgraph sampling and decision making. However, it states these as "a somewhat large threshold T1" and "a small threshold T2", and does not provide concrete numerical values for these thresholds or other hyperparameters for their algorithm or for the comparative algorithms (spectral clustering, correlation clustering).