Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

Authors: Kaito Ariu, Alexandre Proutiere, Se-Young Yun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical evaluations of the proposed algorithm. Our experiments are based on the code of (Wang et al., 2021), and we consider the three scenarios proposed in (Gao et al., 2017), as well as an additional scenario. The main focus of our comparison is the IAC algorithm (Algorithm 1) and a computationally efficient version of the penalized local maximum likelihood estimation (PLMLE) algorithm (Algorithm 3 in (Gao et al., 2017)).
Researcher Affiliation	Collaboration	1Cyber Agent 2KTH, Digital Futures 3KAIST. Correspondence to: Kaito Ariu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Instance-Adaptive Clustering Algorithm 2 Spectral Clustering
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it provide a link to a code repository for the methodology described. It only mentions that "Our experiments are based on the code of (Wang et al., 2021)", which refers to a third-party code.
Open Datasets	Yes	We applied our algorithm to a real-world dataset, the DBLP citation network dataset (Backstrom et al., 2006).
Dataset Splits	No	The paper describes the DBLP citation network dataset, including the number of researchers (246) and co-authorship connections (1,118). It states that "The researchers in this network were clustered using both IAC and PLMLE, with the number of clusters set to 8 for simplicity." However, it does not provide specific details on training/test/validation splits, sample counts for splits, or reference to predefined splits for reproducibility purposes in the context of the experiments conducted.
Hardware Specification	Yes	The simulations presented in this paper were conducted using the following computational environment. Operating System: mac OS Sonoma Programming Language: MATLAB Processor: Apple M3 Max Memory: 128 GB
Software Dependencies	No	The paper mentions the programming language: MATLAB. However, it does not specify a version number for MATLAB nor does it list any versioned libraries or solvers, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	Our experiments are based on the code of (Wang et al., 2021), and we consider the three scenarios proposed in (Gao et al., 2017), as well as an additional scenario. The main focus of our comparison is the IAC algorithm (Algorithm 1) and a computationally efficient version of the penalized local maximum likelihood estimation (PLMLE) algorithm (Algorithm 3 in (Gao et al., 2017)). In all experiments, we consider simple SBMs with L = 1. Model 1: Balanced Symmetric. ...n = 2500, K = 10, and L = 1. ...\|Ik\| = 250. ...p(k, k, 1) = 0.48 ...p(i, k, 1) = 0.32. Model 2: Imbalanced. ...n = 2000, K = 4, and L = 1. ...\|I1\| = 200, \|I2\| = 400, \|I3\| = 600, and \|I4\| = 800. ...statistical parameter (p(i, k, 1))i,k as in (Gao et al., 2017). Model 3: Sparse Symmetric. ...n = 4000, K = 10, and L = 1. ...\|Ik\| = 400. ...p(k, k, 1) = 0.032 ...p(i, k, 1) = 0.005. Model 4: Sparse Asymmetric. ...n = 1200, K = 4, and L = 1. ...\|Ik\| = 300. ...statistical parameter (p(i, k, 1))i,k as. In F.1 Experiments with Real-World Dataset: "...with the number of clusters set to 8 for simplicity."