Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mixing Time of Metropolis-Hastings for Bayesian Community Detection

Authors: Bumeng Zhuo, Chao Gao

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental followed by some numerical results demonstrating its competitive performance on simulated data sets in Section 4. Section 4. Numerical Results In this section, we study the numerical performance of the Metropolis-Hastings algorithm 1, and the inverse temperature parameter ξ is set to be 1 unless otherwise specified. The initial label assignment vector is chosen such that: half samples are labeled correctly, and the other half samples are labeled randomly. The same mechanism is also mentioned in paper Bickel and Chen (2009).
Researcher Affiliation Academia Bumeng Zhuo EMAIL Chao Gao EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA
Pseudocode Yes Algorithm 1: A Metropolis-Hastings algorithm for Bayesian community detection Input: Adjacency matrix A {0, 1}n n, number of communities K, initial community assignment Z0, inverse temperature parameter ξ, maximum number of iterations T. Output: Community label assignment ZT . for each t {0, 1, 2, . . . , T} do Choose an index j [n] uniformly at random; Randomly assign a new label for index j from the set [K] \ {Zt(j)} to get a new assignment Z ; Zt+1 = Z with probability ρ(Zt, Z ) = min 1, Πξ(Z |A) otherwise set Zt+1 = Zt.
Open Source Code Yes 1. The code is available on https://github.com/zhuobumeng/MH_bayes_SBM.
Open Datasets No In this section, we study the numerical performance of the Metropolis-Hastings algorithm 1, and the inverse temperature parameter ξ is set to be 1 unless otherwise specified. The initial label assignment vector is chosen such that: half samples are labeled correctly, and the other half samples are labeled randomly. The same mechanism is also mentioned in paper Bickel and Chen (2009). Balanced networks. In this setting, we generate networks with 2500 nodes, and 5 communities, each of which consists of 500 nodes. Heterogeneous networks. In this setting, we generate networks with 2000 nodes and 4 communities of sizes 200, 400, 600, and 800, respectively.
Dataset Splits No The paper uses simulated data for its experiments. It describes how networks are generated for different scenarios (e.g., 'Balanced networks. In this setting, we generate networks with 2500 nodes, and 5 communities...'). There is no mention of splitting an existing dataset into training, validation, or test sets.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the numerical experiments or simulations.
Software Dependencies No The paper mentions the 'Metropolis-Hastings algorithm 1' and provides a link to its code. However, it does not specify any particular programming languages, libraries, or solvers with their respective version numbers that were used to implement the algorithm or conduct the experiments.
Experiment Setup Yes In this section, we study the numerical performance of the Metropolis-Hastings algorithm 1, and the inverse temperature parameter ξ is set to be 1 unless otherwise specified. The initial label assignment vector is chosen such that: half samples are labeled correctly, and the other half samples are labeled randomly. ... Balanced networks. In this setting, we generate networks with 2500 nodes, and 5 communities, each of which consists of 500 nodes. ... Heterogeneous networks. In this setting, we generate networks with 2000 nodes and 4 communities of sizes 200, 400, 600, and 800, respectively. The connectivity matrix is set as [0.50 0.29 0.35 0.25; 0.29 0.45 0.25 0.30; 0.35 0.25 0.50 0.35; 0.25 0.30 0.35 0.45]. ... In each setting, we run 20 experiments with independent initializations and adjacency matrices, and the value of each block is the average number of misclassified samples in the 20 experiments.