Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Community Recovery in Graphs with Locality

Authors: Yuxin Chen, Govinda Kamath, Changho Suh, David Tse

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the practical applicability of the proposed algorithms, we have conducted simulations in various settings. All these experiments focused on graphs with n = 100, 000 vertices, and used an error rate of θ = 10% unless otherwise noted. For each point, the empirical success rates averaged over 10 Monte Carlo runs are reported. To evaluate the performance of our algorithm on real data, we ran Spectral-Stitching for Chromosomes 1-22 on the NA12878 data-set made available by 10x-Genomics (10x Genomics, 2015).
Researcher Affiliation Academia Yuxin Chen EMAIL Govinda M. Kamath EMAIL Changho Suh EMAIL David Tse + EMAIL Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA Department of Electrical Engineering, KAIST, Daejeon 305-701, Korea + Department of EECS, University of California, Berkeley CA 94720, USA
Pseudocode Yes Algorithm 1: Spectral-Expanding and Algorithm 2: Spectral-Stitching
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology or links to a repository.
Open Datasets Yes To evaluate the performance of our algorithm on real data, we ran Spectral-Stitching for Chromosomes 1-22 on the NA12878 data-set made available by 10x-Genomics (10x Genomics, 2015). The nominal error rate per read is p = 1%, and the average number of SNPs touched by each sample is L [6, 7]. The number of SNPs n ranges from 34240 to 191829, with the sample size m from 102633 to 574189.
Dataset Splits No The paper describes experiments and simulations using n = 100,000 vertices and Monte Carlo runs, but does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts) or cross-validation setup.
Hardware Specification Yes The time taken to run Spectral-Expanding on a Mac Book Pro equipped with a 2.9 GHz Intel Core i5 and 8GB of memory over rings Rr, where n = 100, 000, θ = 10% and m = 1.5m .
Software Dependencies No The paper does not provide specific software dependencies or library names with version numbers needed to replicate the experiment.
Experiment Setup Yes All these experiments focused on graphs with n = 100, 000 vertices, and used an error rate of θ = 10% unless otherwise noted. For each point, the empirical success rates averaged over 10 Monte Carlo runs are reported. The nominal error rate per read is p = 1%, and the average number of SNPs touched by each sample is L [6, 7]. The number of SNPs n ranges from 34240 to 191829, with the sample size m from 102633 to 574189. Here, we split all vertices into overlapping subsets of size W = 100.