Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Graph Clustering: Block-models and model free results
Authors: Yali Wan, Marina Meila
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experimental evaluation Given G, we obtain a clustering C0 by spectral clustering [15]. Then we calculate clustering C by perturbing C0 with gradually increasing noise. For each C, we construct PFM (C, G)and SBM(C, G) model, and further compute , δ and δ0. If δ δ0, C is guaranteed to be stable by the theorems. In the remainder of this section, we describe the data generating process for the simulated datasets and the results we obtained. |
| Researcher Affiliation | Academia | Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA EMAIL Marina Meil a Department of Statistics University of Washington Seattle, WA 98195-4322, USA EMAIL |
| Pseudocode | Yes | PFM Estimation Algorithm Input Graph G with ˆA, ˆD, ˆL, ˆY , ˆΛ, clustering C with indicator matrix Z. Output (A, L) = PFM(G, C) 1. Construct an orthogonal matrix derived from Z. YZ = ˆD1/2ZC 1/2, with C = ZT ˆDZ the column normalization of Z. (5) 2. Project YZ on ˆY and perform Singular Value Decomposition. F = Y T Z ˆY = UΣV T (6) 3. Change basis in R(YZ) to align with ˆY . Y = YZUV T . Complete Y to an orthonormal basis [Y B] of Rn. (7) 4. Construct Laplacian L and edge probability matrix A. L = Y ˆΛY T + (BBT )ˆL(BBT ), A = ˆD1/2L ˆD1/2. (8) |
| Open Source Code | No | The paper does not provide explicit statements or links for open-source code for the described methodology. |
| Open Datasets | Yes | Political Blogs Dataset A directed network A of hyperlinks between weblogs on US politics, compiled from online directories by Adamic and Glance [2], where each blog is assigned a political leaning, liberal or conservative, based on its blog content. The network A contains 1490 blogs. |
| Dataset Splits | No | The paper describes dataset generation parameters and cluster sizes, but does not provide specific training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The 'Experiment Setup' section describes the data generation process and computed quantities (ε, δ, δ0) but does not provide specific hyperparameter values or detailed training configurations for any model. |