reproducibilityindex.ai

Community Detection Guarantees using Embeddings Learned by Node2Vec

Authors: Andrew Davison, S. Carlyle Morgan, Owen G. Ward

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main result shows that the use of k-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We demonstrate this result empirically for both real and simulated networks, and examine how this relates to other embedding tools and machine learning procedures for network data.
Researcher Affiliation	Academia	Andrew Davison Department of Statistics Columbia University New York, NY 10027 ad3395@columbia.eduS. Carlyle Morgan Department of Statistics University of Michigan Ann Arbor, MA 48109 scmorgan@umich.eduOwen G. Ward Department of Statistics and Actuarial Science Simon Fraser University Burnaby, British Columbia owen_ward@sfu.ca
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	All code required to reproduce all results is included in the code repository in the supplemental files.
Open Datasets	Yes	We first simulate data from the planted partition stochastic block model, SBM(n/κ, κ, p, q, ρn). We use two publicly available networks containing known community structure. We first consider a network of emails between 1005 members of a large research institution, available as part of the Stanford Network Analysis Project [31]. We also consider a widely used dataset of directed edges between 1490 U.S political blogs, collected before the 2004 elections [2].
Dataset Splits	Yes	We then use the true community labels of 10% of these nodes to train a (multinomial) logistic regression classifier, and predict the class label for the remaining 90% of nodes in the network.
Hardware Specification	Yes	All experiments were run on a computing cluster utilising 4 cores of an Intel E5-2683 v4 Broadwell 2.1GHz CPU or similar with 2 GB of memory per core.
Software Dependencies	No	The paper mentions using 'node2vec' implementation but does not specify its version number or any other software with specific versions.
Experiment Setup	Yes	We use an embedding dimension of 64 and do not modify other default tuning parameters for the embedding procedure unless specified, so that p = q = 1. We investigate the role of these tuning parameters below, allowing them to vary as is considered in node2vec. We pass these embedding vectors into k-means clustering, where k = κ, the true number of communities present in the network.