Community Detection Guarantees using Embeddings Learned by Node2Vec
Authors: Andrew Davison, S. Carlyle Morgan, Owen G. Ward
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main result shows that the use of k-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We demonstrate this result empirically for both real and simulated networks, and examine how this relates to other embedding tools and machine learning procedures for network data. |
| Researcher Affiliation | Academia | Andrew Davison Department of Statistics Columbia University New York, NY 10027 ad3395@columbia.eduS. Carlyle Morgan Department of Statistics University of Michigan Ann Arbor, MA 48109 scmorgan@umich.eduOwen G. Ward Department of Statistics and Actuarial Science Simon Fraser University Burnaby, British Columbia owen_ward@sfu.ca |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | All code required to reproduce all results is included in the code repository in the supplemental files. |
| Open Datasets | Yes | We first simulate data from the planted partition stochastic block model, SBM(n/κ, κ, p, q, ρn). We use two publicly available networks containing known community structure. We first consider a network of emails between 1005 members of a large research institution, available as part of the Stanford Network Analysis Project [31]. We also consider a widely used dataset of directed edges between 1490 U.S political blogs, collected before the 2004 elections [2]. |
| Dataset Splits | Yes | We then use the true community labels of 10% of these nodes to train a (multinomial) logistic regression classifier, and predict the class label for the remaining 90% of nodes in the network. |
| Hardware Specification | Yes | All experiments were run on a computing cluster utilising 4 cores of an Intel E5-2683 v4 Broadwell 2.1GHz CPU or similar with 2 GB of memory per core. |
| Software Dependencies | No | The paper mentions using 'node2vec' implementation but does not specify its version number or any other software with specific versions. |
| Experiment Setup | Yes | We use an embedding dimension of 64 and do not modify other default tuning parameters for the embedding procedure unless specified, so that p = q = 1. We investigate the role of these tuning parameters below, allowing them to vary as is considered in node2vec. We pass these embedding vectors into k-means clustering, where k = κ, the true number of communities present in the network. |