Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Graph Clustering: Block-models and model free results

Authors: Yali Wan, Marina Meila

NeurIPS 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experimental evaluation Given G, we obtain a clustering C0 by spectral clustering [15]. Then we calculate clustering C by perturbing C0 with gradually increasing noise. For each C, we construct PFM (C, G)and SBM(C, G) model, and further compute , δ and δ0. If δ δ0, C is guaranteed to be stable by the theorems. In the remainder of this section, we describe the data generating process for the simulated datasets and the results we obtained.
Researcher Affiliation	Academia	Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA EMAIL Marina Meil a Department of Statistics University of Washington Seattle, WA 98195-4322, USA EMAIL
Pseudocode	Yes	PFM Estimation Algorithm Input Graph G with ˆA, ˆD, ˆL, ˆY , ˆΛ, clustering C with indicator matrix Z. Output (A, L) = PFM(G, C) 1. Construct an orthogonal matrix derived from Z. YZ = ˆD1/2ZC 1/2, with C = ZT ˆDZ the column normalization of Z. (5) 2. Project YZ on ˆY and perform Singular Value Decomposition. F = Y T Z ˆY = UΣV T (6) 3. Change basis in R(YZ) to align with ˆY . Y = YZUV T . Complete Y to an orthonormal basis [Y B] of Rn. (7) 4. Construct Laplacian L and edge probability matrix A. L = Y ˆΛY T + (BBT )ˆL(BBT ), A = ˆD1/2L ˆD1/2. (8)
Open Source Code	No	The paper does not provide explicit statements or links for open-source code for the described methodology.
Open Datasets	Yes	Political Blogs Dataset A directed network A of hyperlinks between weblogs on US politics, compiled from online directories by Adamic and Glance [2], where each blog is assigned a political leaning, liberal or conservative, based on its blog content. The network A contains 1490 blogs.
Dataset Splits	No	The paper describes dataset generation parameters and cluster sizes, but does not provide specific training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	The 'Experiment Setup' section describes the data generation process and computed quantities (ε, δ, δ0) but does not provide specific hyperparameter values or detailed training configurations for any model.