Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
Authors: Kaito Ariu, Alexandre Proutiere, Se-Young Yun
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present numerical evaluations of the proposed algorithm. Our experiments are based on the code of (Wang et al., 2021), and we consider the three scenarios proposed in (Gao et al., 2017), as well as an additional scenario. The main focus of our comparison is the IAC algorithm (Algorithm 1) and a computationally efficient version of the penalized local maximum likelihood estimation (PLMLE) algorithm (Algorithm 3 in (Gao et al., 2017)). |
| Researcher Affiliation | Collaboration | 1Cyber Agent 2KTH, Digital Futures 3KAIST. Correspondence to: Kaito Ariu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Instance-Adaptive Clustering Algorithm 2 Spectral Clustering |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code, nor does it provide a link to a code repository for the methodology described. It only mentions that "Our experiments are based on the code of (Wang et al., 2021)", which refers to a third-party code. |
| Open Datasets | Yes | We applied our algorithm to a real-world dataset, the DBLP citation network dataset (Backstrom et al., 2006). |
| Dataset Splits | No | The paper describes the DBLP citation network dataset, including the number of researchers (246) and co-authorship connections (1,118). It states that "The researchers in this network were clustered using both IAC and PLMLE, with the number of clusters set to 8 for simplicity." However, it does not provide specific details on training/test/validation splits, sample counts for splits, or reference to predefined splits for reproducibility purposes in the context of the experiments conducted. |
| Hardware Specification | Yes | The simulations presented in this paper were conducted using the following computational environment. Operating System: mac OS Sonoma Programming Language: MATLAB Processor: Apple M3 Max Memory: 128 GB |
| Software Dependencies | No | The paper mentions the programming language: MATLAB. However, it does not specify a version number for MATLAB nor does it list any versioned libraries or solvers, which are required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Our experiments are based on the code of (Wang et al., 2021), and we consider the three scenarios proposed in (Gao et al., 2017), as well as an additional scenario. The main focus of our comparison is the IAC algorithm (Algorithm 1) and a computationally efficient version of the penalized local maximum likelihood estimation (PLMLE) algorithm (Algorithm 3 in (Gao et al., 2017)). In all experiments, we consider simple SBMs with L = 1. Model 1: Balanced Symmetric. ...n = 2500, K = 10, and L = 1. ...|Ik| = 250. ...p(k, k, 1) = 0.48 ...p(i, k, 1) = 0.32. Model 2: Imbalanced. ...n = 2000, K = 4, and L = 1. ...|I1| = 200, |I2| = 400, |I3| = 600, and |I4| = 800. ...statistical parameter (p(i, k, 1))i,k as in (Gao et al., 2017). Model 3: Sparse Symmetric. ...n = 4000, K = 10, and L = 1. ...|Ik| = 400. ...p(k, k, 1) = 0.032 ...p(i, k, 1) = 0.005. Model 4: Sparse Asymmetric. ...n = 1200, K = 4, and L = 1. ...|Ik| = 300. ...statistical parameter (p(i, k, 1))i,k as. In F.1 Experiments with Real-World Dataset: "...with the number of clusters set to 8 for simplicity." |