Graph Auto-Encoder via Neighborhood Wasserstein Reconstruction

Authors: Mingyue Tang, Pan Li, Carl Yang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both synthetic and real-world network datasets show that the unsupervised node representations learned with NWR have much more advantageous in structure-oriented graph mining tasks, while also achieving competitive performance in proximity-oriented ones.We design our experiments to evaluate NWR-GAE, focusing on the following research questions: RQ1: How does NWR-GAE perform on structure-role-based synthetic datasets in comparison to state-of-the-art unsupervised graph embedding baselines? RQ2: How do NWR-GAE and its ablations compare to the baselines on different types of real-world graph datasets? RQ3: What are the impacts of the major model parameters including embedding size d and sampling size q on NWR-GAE?
Researcher Affiliation Academia Mingyue Tang1 , Carl Yang2 , Pan Li3 1Department of Engineering Systems and Environment, University of Virginia 2Department of Computer Science, Emory University 3Department of Computer Science, Purdue University
Pseudocode No The paper does not contain a clearly labeled section or block for "Pseudocode" or "Algorithm".
Open Source Code Yes 1Code available at https://github.com/mtang724/NWR-GAE.
Open Datasets Yes We use a total of nine public real-world graph datasets, which roughly belong to three types, one with proximity-oriented (assortative (Liu et al., 2020)) labels, one with structure-oriented (disassortative) labels, and one with proximity-structure-mixed labels. Among them, Cora, Citeseer, Pubmed are publication networks (Namata et al., 2012); Cornell, Texas, and Wisconsin are school department webpage networks (Pei et al., 2020); Chameleon, Squirrel are page-page networks in Wikipedia (Rozemberczki et al., 2021); Actor is an actor co-filming network (Tang et al., 2009).
Dataset Splits Yes To be consistent across all datasets in our experiments, we did not follow the standard semi-supervised setting (20 labels per class for training) on Cora, Citeseer and Pubmed, but rather randomly split all datasets with 60% training set, 20% validation set, and 20% testing set, which is a common practice on Web KB and Wikipedia network datasets (i.e. Cornell, Texas, Chameleon, etc.) (Liu et al., 2020; Pei et 2020; Ma et al., 2021).
Hardware Specification Yes Most experiments are performed on a 8GB NVIDIA Ge Force RTX 3070 GPU.
Software Dependencies No The paper mentions software like "Py Torch Python package" and "DGL library", but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For all compared models, we performed hyper-parameter selection on learning rate {5e-3, 5e-4, 5e-5, 5e-6, 5e-7} and epoch size {100, 200, 300, 400, 500, 600}. For NWR-GAE, we selected the sample size q from {3, 5, 8, 10}, and the trade-off weight parameters λd, λs from {10, 1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5}. For all settings, we use Adam optimizer and backward propagation from Py Torch Python package, and a fix dimension size as same as the graph node feature size.