Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spectral Algorithms for Community Detection in Directed Networks

Authors: Zhe Wang, Yingbin Liang, Pengsheng Ji

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experimental studies to compare the performance of six spectral clustering algorithms, namely, D-SCORE, D-SCOREq, r D-SCORE, r D-SCOREq, o PCA, r PCA, and two likelihood algorithms APL (Amini et al., 2013) and BCPL (Bickel and Chen, 2009b). We compare these eight algorithms on the web blogs data and the experiments on simulated data.
Researcher Affiliation	Academia	Zhe Wang Department of Electrical and Computer Engineering The Ohio State University Columbus, OH 43202, USA EMAIL Yingbin Liang Department of Electrical and Computer Engineering The Ohio State University Columbus, OH 43202, USA EMAIL Pengsheng Ji Department of Statistics University of Georgia Athens, GA 30602, USA EMAIL
Pseudocode	Yes	Algorithm 1: D-SCORE( ˆU, ˆV, K) Algorithm 2: D-SCOREq( ˆU, ˆV, K) Algorithm 3: Improved D-SCOREq(K, A) using intersection-with-attachment Algorithm 4: o PCA Algorithm 5: Regularized graph Laplacian
Open Source Code	No	The paper does not provide explicit links to source code repositories or statements confirming the release of their implementation code for the described methodology. The provided license link (https://creativecommons.org/licenses/by/4.0/) is for the paper itself, not the code.
Open Datasets	Yes	In this subsection, we apply the above mentioned eight algorithms to the web blogs data introduced in Adamic and Glance (2005). In this subsection, we apply the above mentioned eight algorithms to the email-Eu-core network introduced in Leskovec and Krevl (2014). The email data was collected from a large European research institution, and a directed edge from node i to node j indicates that person i has sent at least one email to person j. Clearly, the email-Eu-core network is also a directed network.
Dataset Splits	No	In our experiment, we ﬁrst extract the largest component of the graph, which contains 1222 nodes... We repeat each algorithm on each setting 500 times and take the mean of the total number of misclustered nodes. In this subsection, we apply the above mentioned eight algorithms to the email-Eu-core network introduced in Leskovec and Krevl (2014). The email data was collected from a large European research institution, and a directed edge from node i to node j indicates that person i has sent at least one email to person j. Clearly, the email-Eu-core network is also a directed network. There are many communities in this network, but we extract the top 4 largest communities which contains 297 nodes as the entire graph and 252 nodes in intersection graph. We repeat the experiment 500 times and show the mean error in table 2.
Hardware Specification	No	The paper does not explicitly describe any specific hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments or simulations.
Software Dependencies	No	The paper does not provide specific version numbers for any software libraries, programming languages, or tools used in the implementation of the algorithms or experiments.
Experiment Setup	Yes	Fix a threshold Tn = log n (used to avoid zero denominator), deﬁne the n (K 1) ratio matrices R ˆU and R ˆV, such that for 1 i n, 1 k (K 1)... (Algorithm 1) The regularization parameter τ is usually set as the average degree τ = Pn i,j=1 A(i, j)/n. (Algorithm 5) We repeat each algorithm on each setting 500 times and take the mean of the total number of misclustered nodes.