Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Clustering Partially Observed Graphs via Convex Optimization

Authors: Yudong Chen, Ali Jalali, Sujay Sanghavi, Huan Xu

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experimental Results We explore via simulation the performance of our algorithm as a function of the values of the model parameters (n, Kmin, p0, τ). We see that the performance matches well with the theory. In the experiment, each test case is constructed by generating a graph with n nodes divided into clusters of equal size Kmin, and then placing a disagreement on each pair of node with probability τ independently. Each node pair is then observed with probability p0. We then run Algorithm 1, where the optimization problem (1) is solved using the fast algorithm in Lin et al. (2009).. We check if the algorithm successfully outputs a solution that equals to the underlying true clusters.
Researcher Affiliation	Academia	Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712, USA Department of Mechanical Engineering National University of Singapore Singapore 117575, SINGAPORE
Pseudocode	Yes	Algorithm 1 Optimal-Cluster(A) λ 1 32 p0n while not terminated do Solve (1) to obtain the solution (B, K) if K is valid then Output the clustering in K and EXIT. else if trace(K) > n then λ λ/2 else if trace(K) < n then λ 2λ end if end while Declare Failure.
Open Source Code	No	The paper does not provide any specific links to source code repositories or statements about releasing the code for the described methodology. It mentions using 'the fast algorithm developed by Lin et al. (2009)', which refers to a third-party tool.
Open Datasets	No	The paper evaluates its algorithm on the 'classical Planted Partition/Stochastic Block Model' and states, 'each test case is constructed by generating a graph with n nodes divided into clusters of equal size Kmin'. This indicates the use of synthetically generated data based on a model rather than a specific publicly available dataset with a link or citation.
Dataset Splits	No	The paper describes generating synthetic graphs for its experiments ('each test case is constructed by generating a graph with n nodes divided into clusters of equal size Kmin') rather than using pre-defined dataset splits from a fixed public dataset. Therefore, traditional training/test/validation splits are not applicable or specified.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used to run the simulations described in the 'Experimental Results' section.
Software Dependencies	No	The paper mentions using 'the fast algorithm developed by Lin et al. (2009)' to solve the optimization problem (1), but it does not specify any version numbers for this or any other software dependencies.
Experiment Setup	Yes	In the ﬁrst set of experiments, we ﬁx τ = 0.2 and Kmin = n/4 and vary p0 and n. For each (p0, n), we repeat the experiment for 5 times and plot the probability of success in the left pane of Figure 2. ... For spectral clustering, we ﬁrst impute the missing entries of the adjacency matrix with either zeros or random 1/0 s. We then compute the ﬁrst k principal components of the adjacency matrix, and run k-means clustering on the principal components (von Luxburg, 2007); here we set k equal to the number of clusters.