Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Asymptotics of $\ell_2$ Regularized Network Embeddings
Authors: Andrew Davison
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now examine the performance in using regularized node2vec embeddings for link prediction and node classification tasks, and illustrate comparable, when not superior, performance to more complicated encoders for network embeddings. We perform experiments on the Cora, Cite Seer and Pub Med Diabetes citation network datasets |
| Researcher Affiliation | Academia | Andrew Davison Department of Statistics Columbia University New York, NY 10027 EMAIL |
| Pseudocode | Yes | Algorithm 1 (Uniform vertex sampling). Given a graph Gn and number of samples k, we select k vertices from Gn uniformly and without replacement, and then return S(Gn) as the induced subgraph using these sampled vertices. |
| Open Source Code | Yes | The code used for the experiments can be found at https://github.com/AndrewDavidson21/regularized_node_embeddings. |
| Open Datasets | Yes | We perform experiments on the Cora, Cite Seer and Pub Med Diabetes citation network datasets (see Appendix G for more details), which we use as they are commonly used benchmark datasets see e.g [26, 28, 34, 64]. ... All are publicly available through the StellarGraph library [20]. |
| Dataset Splits | Yes | For the link prediction experiments, we create a training graph by removing 10% of both the edges and non-edges within the network, and use this to learn an embedding of the network. We then form link embeddings by taking the entry-wise product of the corresponding node embeddings, use 10% of the held-out edges to build a logistic classifier for the link categories, and then evaluate the performance on the remaining edges, repeating this process 50 times. ... To evaluate performance for the node classification task, we learn a network embedding without access to the node labels, and then learn/evaluate a one-versus-rest multinomial node classifier using 5%/95% stratified training/test splits of the node labels. |
| Hardware Specification | Yes | All experiments used a single NVIDIA GeForce RTX 3090 GPU, with Python 3.8.12, PyTorch 1.10.1 and CUDA 11.3. |
| Software Dependencies | Yes | All experiments used a single NVIDIA GeForce RTX 3090 GPU, with Python 3.8.12, PyTorch 1.10.1 and CUDA 11.3. |
| Experiment Setup | Yes | For node2vec, we use the default parameters as given in [25] (return_weight = 1, in_out_weight = 1, walk_length = 80, num_walks = 10, workers = 1, batch_size = 1) and embedding dimension of 128. We train all node2vec models for 50 epochs. |