Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

Authors: Kaito Ariu, Alexandre Proutiere, Se-Young Yun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical evaluations of the proposed algorithm. Our experiments are based on the code of (Wang et al., 2021), and we consider the three scenarios proposed in (Gao et al., 2017), as well as an additional scenario. The main focus of our comparison is the IAC algorithm (Algorithm 1) and a computationally efficient version of the penalized local maximum likelihood estimation (PLMLE) algorithm (Algorithm 3 in (Gao et al., 2017)).
Researcher Affiliation Collaboration 1Cyber Agent 2KTH, Digital Futures 3KAIST. Correspondence to: Kaito Ariu <EMAIL>.
Pseudocode Yes Algorithm 1 Instance-Adaptive Clustering Algorithm 2 Spectral Clustering
Open Source Code No The paper does not provide an explicit statement about releasing its own source code, nor does it provide a link to a code repository for the methodology described. It only mentions that "Our experiments are based on the code of (Wang et al., 2021)", which refers to a third-party code.
Open Datasets Yes We applied our algorithm to a real-world dataset, the DBLP citation network dataset (Backstrom et al., 2006).
Dataset Splits No The paper describes the DBLP citation network dataset, including the number of researchers (246) and co-authorship connections (1,118). It states that "The researchers in this network were clustered using both IAC and PLMLE, with the number of clusters set to 8 for simplicity." However, it does not provide specific details on training/test/validation splits, sample counts for splits, or reference to predefined splits for reproducibility purposes in the context of the experiments conducted.
Hardware Specification Yes The simulations presented in this paper were conducted using the following computational environment. Operating System: mac OS Sonoma Programming Language: MATLAB Processor: Apple M3 Max Memory: 128 GB
Software Dependencies No The paper mentions the programming language: MATLAB. However, it does not specify a version number for MATLAB nor does it list any versioned libraries or solvers, which are required for a reproducible description of ancillary software.
Experiment Setup Yes Our experiments are based on the code of (Wang et al., 2021), and we consider the three scenarios proposed in (Gao et al., 2017), as well as an additional scenario. The main focus of our comparison is the IAC algorithm (Algorithm 1) and a computationally efficient version of the penalized local maximum likelihood estimation (PLMLE) algorithm (Algorithm 3 in (Gao et al., 2017)). In all experiments, we consider simple SBMs with L = 1. Model 1: Balanced Symmetric. ...n = 2500, K = 10, and L = 1. ...|Ik| = 250. ...p(k, k, 1) = 0.48 ...p(i, k, 1) = 0.32. Model 2: Imbalanced. ...n = 2000, K = 4, and L = 1. ...|I1| = 200, |I2| = 400, |I3| = 600, and |I4| = 800. ...statistical parameter (p(i, k, 1))i,k as in (Gao et al., 2017). Model 3: Sparse Symmetric. ...n = 4000, K = 10, and L = 1. ...|Ik| = 400. ...p(k, k, 1) = 0.032 ...p(i, k, 1) = 0.005. Model 4: Sparse Asymmetric. ...n = 1200, K = 4, and L = 1. ...|Ik| = 300. ...statistical parameter (p(i, k, 1))i,k as. In F.1 Experiments with Real-World Dataset: "...with the number of clusters set to 8 for simplicity."