Why the Metric Backbone Preserves Community Structure

Authors: Maximilien Dreveton, Charbel Chucri, Matthias Grossglauser, Patrick Thiran

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To do so, we formally characterize the metric backbone of a weighted stochastic block model (w SBM). We assume that the vertices are separated into k blocks (also referred to as clusters or communities). An edge between two vertices belonging to blocks a and b is present with probability pab and is independent of the presence or absence of other edges. This edge, if present, has a weight sampled from a distribution with cdf Fab, and this weight represents the cost of traveling through this edge. We denote by pmb ab the probability that an edge between a vertex in block a and a vertex in block b is present in the backbone.3 Under loose assumptions on the costs distributions Fab and on the probabilities pab, we show that pmb ab /pmb cd = (1 + o(1))pab/pcd for every a, b, c, d [k]. This shows that the metric backbone thins the edge set approximately uniformly and, therefore, preserves the community structure of the original graph. Moreover, we also prove that a spectral algorithm recovers almost exactly the communities. We also conduct numerical experiments with several graph clustering algorithms and datasets to back up these theoretical results.
Researcher Affiliation Academia Maximilien Dreveton EPFL maximilien.drev eton@epfl.ch Charbel Chucri EPFL charbel.chucri@epfl.ch Matthias Grossglauser EPFL matthias.grossglauser@epfl.ch Patrick Thiran EPFL patrick.thiran@epfl.ch
Pseudocode Yes Algorithm 1: Spectral Clustering on the weighted adjacency matrix of the metric backbone
Open Source Code Yes Code availability We provide the code used for the experiments: https://github.com/ Charbel-11/Why-the-Metric-Backbone-Preserves-Community-Structure
Open Datasets Yes We sample n = 10000 images from MNIST [26] and Fashion MNIST [42], and use the full HAR [3] dataset (n = 10299).
Dataset Splits No The paper mentions evaluating performance and averaging results over trials, but it does not explicitly specify training, validation, or test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification Yes Computing Resources All experiments were run on a standard laptop with 16GB of RAM and 12th Gen Intel(R) Core(TM) i7-1250U CPU.
Software Dependencies No The paper mentions several software components like 'igraph distances implementation [11]', 'Py GSP package [13]', 'graph-tool implementation', and 'scikit-learn', but it does not specify exact version numbers for these dependencies.
Experiment Setup Yes We use the graph-tool implementation for the Bayesian algorithm, with an exponential prior for the weight distributions. The Leiden algorithm is implemented at https://github.com/vtraag/leidenalg. For spectral clustering, we assume that the algorithm knows the correct number of clusters in advance, and we use the implementation from scikit-learn.7