Exact recovery and Bregman hard clustering of node-attributed Stochastic Block Model
Authors: Maximilien Dreveton, Felipe Fernandes, Daniel Figueiredo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments using synthetic data indicate that the proposed algorithm outperforms classic algorithms that leverage only network or only attribute information as well as state-of-the-art algorithms that also leverage both sources of information. Numerical results and comparisons to prior works are presented in Section 5. We first compare in Figure 1 the performance of Algorithm 1 in terms of exact recovery (fraction of times the algorithm correctly recovers the community of all nodes) with the theoretical threshold for exact recovery proved in the paper (red curve in the plots). Evaluation using real datasets. |
| Researcher Affiliation | Academia | Maximilien Dreveton School of Computer and Communication Sciences EPFL, Lausanne, Switzerland maximilien.dreveton@epfl.ch Felipe S. Fernandes Systems Engineering and Computer Science Federal University of Rio de Janeiro Rio de Janeiro, Brazil felipesc@cos.ufrj.br Daniel R. Figueiredo Systems Engineering and Computer Science Federal University of Rio de Janeiro Rio de Janeiro, Brazil daniel@cos.ufrj.br |
| Pseudocode | Yes | Algorithm 1: Bregman hard clustering for node-attributed SBM. |
| Open Source Code | No | The paper does not provide any statement about open-source code release or links to a code repository. |
| Open Datasets | Yes | The following three benchmark datasets were used to evaluate and compare the proposed algorithm: Cite Seer (n = 3279, m = 9104, K = 6, d = 3703), Cora (n = 2708, m = 10556, K = 7, d = 1433), and Cornell (n = 183, m = 298, K = 5, d = 1703) (all available in Pytorch Geometric). |
| Dataset Splits | No | The paper discusses the synthetic data generation process and mentions the total size (n) for real datasets, but it does not specify any training, validation, or test splits for these datasets. It evaluates performance metrics (e.g., Adjusted Rand Index) on the datasets, implying some form of evaluation, but the split methodology is not explicitly stated. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch Geometric' in the context of datasets but does not specify its version or any other software, libraries, or solver versions used in the experiments. |
| Experiment Setup | Yes | We take n = 600, K = 2, fin = Ber(an 1 log n), fout = Ber(5n 1 log n), and 2-dimensional Gaussian attributes with unit variance and mean (r, 0). Results are averaged over 60 runs. For each network, the original node attribute vector was reduced to have dimension d = 10 by selecting the 10 best features according to the chi-square test. Algorithm 1 assumed a multivariate Gaussian distribution with d = 10 for node attributes and Bernoulli edges (these networks have no edge weights). The initialization for Algorithm 1 and att SBM used spectral clustering of both the node similarity matrix (using node attributes) and network edges. |