Thinned random measures for sparse graphs with overlapping communities

Authors: Federica Zoe Ricci, Michele Guindani, Erik Sudderth

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show recovery of communities for networks with thousands of nodes. We compare the performance of our TGGP framework for adding overlapping communities to the GGP model with the approaches detailed in Sec. 2.2 and 2.1: the CGGP (sparse with mixed memberships) [29], the SBM-GGP model (sparse with single memberships) [28], and the more classical SBM (dense with single membership) [2] and MMSB (dense with mixed memberships) [5]. We run 50,000 iterations of the five models MCMC on four real-world networks; see Appendix for data sources and pre-processing. Each model was fit to fully observed data to learn node-specific parameters (e.g., sociabilities and community memberships) and community-interaction probabilities, using the values from the last MCMC iteration. Two different measures of posterior predictive accuracy [43] were then used to assess the goodness of model fit.
Researcher Affiliation Academia Federica Zoe Ricci Department of Statistics University of California, Irvine, CA, USA fzricci@uci.edu Michele Guindani Department of Biostatistics University of California, Los Angeles, CA, USA mguindani@g.ucla.edu Erik B. Sudderth Departments of Computer Science and Statistics University of California, Irvine, CA, USA sudderth@uci.edu
Pseudocode No The paper describes the steps of its Monte Carlo Posterior Inference and Gibbs sampling strategy in prose, but it does not include a formally structured pseudocode block or algorithm figure.
Open Source Code Yes Code can be found on the first author s website https://federicazoe.github.io/
Open Datasets Yes We run 50,000 iterations of the five models MCMC on four real-world networks; see Appendix for data sources and pre-processing. datasets are publicly available at the links provided in the Appendix. If your work uses existing assets, did you cite the creators? [Yes] See Appendix C. Did you mention the license of the assets? [Yes] See Appendix C.
Dataset Splits No The paper describes how specific entries in the adjacency matrix were randomly selected or obscured for evaluation (e.g., “randomly selected 5% of the entries equal to 1”, “randomly obscure 5% of all entries”), and mentions MCMC iteration counts for training and burn-in. However, it does not provide explicit training, validation, or test dataset splits (e.g., 80/10/10 percentages or sample counts) for the overall datasets.
Hardware Specification No We used an internal cluster and do not have a detailed record of computation times, but they were not substantial (at most a few days for each evaluated model). This does not specify exact hardware components (e.g., specific GPU or CPU models).
Software Dependencies No The paper does not list specific software dependencies (e.g., libraries, frameworks, or solvers) with their version numbers that would be necessary to reproduce the experiments.
Experiment Setup Yes We then run our MCMC sampler for 50,000 iterations, discarding the first 40,000 samples as burn-in. For our model fitting, we let 50 be the upper bound to the number of communities and we set γ = 10 and ζ = 0.5 to allow for learning the number of communities. Inference uses the true number of communities for the CGGP, while our TGGP is given a loose upper bound of K = 50, γ = 10, ζ = 0.2.