Exact recovery and Bregman hard clustering of node-attributed Stochastic Block Model

Authors: Maximilien Dreveton, Felipe Fernandes, Daniel Figueiredo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments using synthetic data indicate that the proposed algorithm outperforms classic algorithms that leverage only network or only attribute information as well as state-of-the-art algorithms that also leverage both sources of information. Numerical results and comparisons to prior works are presented in Section 5. We first compare in Figure 1 the performance of Algorithm 1 in terms of exact recovery (fraction of times the algorithm correctly recovers the community of all nodes) with the theoretical threshold for exact recovery proved in the paper (red curve in the plots). Evaluation using real datasets.
Researcher Affiliation Academia Maximilien Dreveton School of Computer and Communication Sciences EPFL, Lausanne, Switzerland maximilien.dreveton@epfl.ch Felipe S. Fernandes Systems Engineering and Computer Science Federal University of Rio de Janeiro Rio de Janeiro, Brazil felipesc@cos.ufrj.br Daniel R. Figueiredo Systems Engineering and Computer Science Federal University of Rio de Janeiro Rio de Janeiro, Brazil daniel@cos.ufrj.br
Pseudocode Yes Algorithm 1: Bregman hard clustering for node-attributed SBM.
Open Source Code No The paper does not provide any statement about open-source code release or links to a code repository.
Open Datasets Yes The following three benchmark datasets were used to evaluate and compare the proposed algorithm: Cite Seer (n = 3279, m = 9104, K = 6, d = 3703), Cora (n = 2708, m = 10556, K = 7, d = 1433), and Cornell (n = 183, m = 298, K = 5, d = 1703) (all available in Pytorch Geometric).
Dataset Splits No The paper discusses the synthetic data generation process and mentions the total size (n) for real datasets, but it does not specify any training, validation, or test splits for these datasets. It evaluates performance metrics (e.g., Adjusted Rand Index) on the datasets, implying some form of evaluation, but the split methodology is not explicitly stated.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'Pytorch Geometric' in the context of datasets but does not specify its version or any other software, libraries, or solver versions used in the experiments.
Experiment Setup Yes We take n = 600, K = 2, fin = Ber(an 1 log n), fout = Ber(5n 1 log n), and 2-dimensional Gaussian attributes with unit variance and mean (r, 0). Results are averaged over 60 runs. For each network, the original node attribute vector was reduced to have dimension d = 10 by selecting the 10 best features according to the chi-square test. Algorithm 1 assumed a multivariate Gaussian distribution with d = 10 for node attributes and Bernoulli edges (these networks have no edge weights). The initialization for Algorithm 1 and att SBM used spectral clustering of both the node similarity matrix (using node attributes) and network edges.