Why the Metric Backbone Preserves Community Structure
Authors: Maximilien Dreveton, Charbel Chucri, Matthias Grossglauser, Patrick Thiran
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To do so, we formally characterize the metric backbone of a weighted stochastic block model (w SBM). We assume that the vertices are separated into k blocks (also referred to as clusters or communities). An edge between two vertices belonging to blocks a and b is present with probability pab and is independent of the presence or absence of other edges. This edge, if present, has a weight sampled from a distribution with cdf Fab, and this weight represents the cost of traveling through this edge. We denote by pmb ab the probability that an edge between a vertex in block a and a vertex in block b is present in the backbone.3 Under loose assumptions on the costs distributions Fab and on the probabilities pab, we show that pmb ab /pmb cd = (1 + o(1))pab/pcd for every a, b, c, d [k]. This shows that the metric backbone thins the edge set approximately uniformly and, therefore, preserves the community structure of the original graph. Moreover, we also prove that a spectral algorithm recovers almost exactly the communities. We also conduct numerical experiments with several graph clustering algorithms and datasets to back up these theoretical results. |
| Researcher Affiliation | Academia | Maximilien Dreveton EPFL maximilien.drev eton@epfl.ch Charbel Chucri EPFL charbel.chucri@epfl.ch Matthias Grossglauser EPFL matthias.grossglauser@epfl.ch Patrick Thiran EPFL patrick.thiran@epfl.ch |
| Pseudocode | Yes | Algorithm 1: Spectral Clustering on the weighted adjacency matrix of the metric backbone |
| Open Source Code | Yes | Code availability We provide the code used for the experiments: https://github.com/ Charbel-11/Why-the-Metric-Backbone-Preserves-Community-Structure |
| Open Datasets | Yes | We sample n = 10000 images from MNIST [26] and Fashion MNIST [42], and use the full HAR [3] dataset (n = 10299). |
| Dataset Splits | No | The paper mentions evaluating performance and averaging results over trials, but it does not explicitly specify training, validation, or test dataset splits with percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | Yes | Computing Resources All experiments were run on a standard laptop with 16GB of RAM and 12th Gen Intel(R) Core(TM) i7-1250U CPU. |
| Software Dependencies | No | The paper mentions several software components like 'igraph distances implementation [11]', 'Py GSP package [13]', 'graph-tool implementation', and 'scikit-learn', but it does not specify exact version numbers for these dependencies. |
| Experiment Setup | Yes | We use the graph-tool implementation for the Bayesian algorithm, with an exponential prior for the weight distributions. The Leiden algorithm is implemented at https://github.com/vtraag/leidenalg. For spectral clustering, we assume that the algorithm knows the correct number of clusters in advance, and we use the implementation from scikit-learn.7 |