Asymptotics of $\ell_2$ Regularized Network Embeddings

Authors: Andrew Davison

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now examine the performance in using regularized node2vec embeddings for link prediction and node classification tasks, and illustrate comparable, when not superior, performance to more complicated encoders for network embeddings. We perform experiments on the Cora, Cite Seer and Pub Med Diabetes citation network datasets
Researcher Affiliation Academia Andrew Davison Department of Statistics Columbia University New York, NY 10027 ad3395@columbia.edu
Pseudocode Yes Algorithm 1 (Uniform vertex sampling). Given a graph Gn and number of samples k, we select k vertices from Gn uniformly and without replacement, and then return S(Gn) as the induced subgraph using these sampled vertices.
Open Source Code Yes The code used for the experiments can be found at https://github.com/AndrewDavidson21/regularized_node_embeddings.
Open Datasets Yes We perform experiments on the Cora, Cite Seer and Pub Med Diabetes citation network datasets (see Appendix G for more details), which we use as they are commonly used benchmark datasets see e.g [26, 28, 34, 64]. ... All are publicly available through the StellarGraph library [20].
Dataset Splits Yes For the link prediction experiments, we create a training graph by removing 10% of both the edges and non-edges within the network, and use this to learn an embedding of the network. We then form link embeddings by taking the entry-wise product of the corresponding node embeddings, use 10% of the held-out edges to build a logistic classifier for the link categories, and then evaluate the performance on the remaining edges, repeating this process 50 times. ... To evaluate performance for the node classification task, we learn a network embedding without access to the node labels, and then learn/evaluate a one-versus-rest multinomial node classifier using 5%/95% stratified training/test splits of the node labels.
Hardware Specification Yes All experiments used a single NVIDIA GeForce RTX 3090 GPU, with Python 3.8.12, PyTorch 1.10.1 and CUDA 11.3.
Software Dependencies Yes All experiments used a single NVIDIA GeForce RTX 3090 GPU, with Python 3.8.12, PyTorch 1.10.1 and CUDA 11.3.
Experiment Setup Yes For node2vec, we use the default parameters as given in [25] (return_weight = 1, in_out_weight = 1, walk_length = 80, num_walks = 10, workers = 1, batch_size = 1) and embedding dimension of 128. We train all node2vec models for 50 epochs.