Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Graph Auto-Encoder via Neighborhood Wasserstein Reconstruction
Authors: Mingyue Tang, Pan Li, Carl Yang
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both synthetic and real-world network datasets show that the unsupervised node representations learned with NWR have much more advantageous in structure-oriented graph mining tasks, while also achieving competitive performance in proximity-oriented ones.We design our experiments to evaluate NWR-GAE, focusing on the following research questions: RQ1: How does NWR-GAE perform on structure-role-based synthetic datasets in comparison to state-of-the-art unsupervised graph embedding baselines? RQ2: How do NWR-GAE and its ablations compare to the baselines on different types of real-world graph datasets? RQ3: What are the impacts of the major model parameters including embedding size d and sampling size q on NWR-GAE? |
| Researcher Affiliation | Academia | Mingyue Tang1 , Carl Yang2 , Pan Li3 1Department of Engineering Systems and Environment, University of Virginia 2Department of Computer Science, Emory University 3Department of Computer Science, Purdue University |
| Pseudocode | No | The paper does not contain a clearly labeled section or block for "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | 1Code available at https://github.com/mtang724/NWR-GAE. |
| Open Datasets | Yes | We use a total of nine public real-world graph datasets, which roughly belong to three types, one with proximity-oriented (assortative (Liu et al., 2020)) labels, one with structure-oriented (disassortative) labels, and one with proximity-structure-mixed labels. Among them, Cora, Citeseer, Pubmed are publication networks (Namata et al., 2012); Cornell, Texas, and Wisconsin are school department webpage networks (Pei et al., 2020); Chameleon, Squirrel are page-page networks in Wikipedia (Rozemberczki et al., 2021); Actor is an actor co-filming network (Tang et al., 2009). |
| Dataset Splits | Yes | To be consistent across all datasets in our experiments, we did not follow the standard semi-supervised setting (20 labels per class for training) on Cora, Citeseer and Pubmed, but rather randomly split all datasets with 60% training set, 20% validation set, and 20% testing set, which is a common practice on Web KB and Wikipedia network datasets (i.e. Cornell, Texas, Chameleon, etc.) (Liu et al., 2020; Pei et 2020; Ma et al., 2021). |
| Hardware Specification | Yes | Most experiments are performed on a 8GB NVIDIA Ge Force RTX 3070 GPU. |
| Software Dependencies | No | The paper mentions software like "Py Torch Python package" and "DGL library", but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all compared models, we performed hyper-parameter selection on learning rate {5e-3, 5e-4, 5e-5, 5e-6, 5e-7} and epoch size {100, 200, 300, 400, 500, 600}. For NWR-GAE, we selected the sample size q from {3, 5, 8, 10}, and the trade-off weight parameters λd, λs from {10, 1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5}. For all settings, we use Adam optimizer and backward propagation from Py Torch Python package, and a fix dimension size as same as the graph node feature size. |