Graph Auto-Encoder via Neighborhood Wasserstein Reconstruction
Authors: Mingyue Tang, Pan Li, Carl Yang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both synthetic and real-world network datasets show that the unsupervised node representations learned with NWR have much more advantageous in structure-oriented graph mining tasks, while also achieving competitive performance in proximity-oriented ones.We design our experiments to evaluate NWR-GAE, focusing on the following research questions: RQ1: How does NWR-GAE perform on structure-role-based synthetic datasets in comparison to state-of-the-art unsupervised graph embedding baselines? RQ2: How do NWR-GAE and its ablations compare to the baselines on different types of real-world graph datasets? RQ3: What are the impacts of the major model parameters including embedding size d and sampling size q on NWR-GAE? |
| Researcher Affiliation | Academia | Mingyue Tang1 , Carl Yang2 , Pan Li3 1Department of Engineering Systems and Environment, University of Virginia 2Department of Computer Science, Emory University 3Department of Computer Science, Purdue University |
| Pseudocode | No | The paper does not contain a clearly labeled section or block for "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | 1Code available at https://github.com/mtang724/NWR-GAE. |
| Open Datasets | Yes | We use a total of nine public real-world graph datasets, which roughly belong to three types, one with proximity-oriented (assortative (Liu et al., 2020)) labels, one with structure-oriented (disassortative) labels, and one with proximity-structure-mixed labels. Among them, Cora, Citeseer, Pubmed are publication networks (Namata et al., 2012); Cornell, Texas, and Wisconsin are school department webpage networks (Pei et al., 2020); Chameleon, Squirrel are page-page networks in Wikipedia (Rozemberczki et al., 2021); Actor is an actor co-filming network (Tang et al., 2009). |
| Dataset Splits | Yes | To be consistent across all datasets in our experiments, we did not follow the standard semi-supervised setting (20 labels per class for training) on Cora, Citeseer and Pubmed, but rather randomly split all datasets with 60% training set, 20% validation set, and 20% testing set, which is a common practice on Web KB and Wikipedia network datasets (i.e. Cornell, Texas, Chameleon, etc.) (Liu et al., 2020; Pei et 2020; Ma et al., 2021). |
| Hardware Specification | Yes | Most experiments are performed on a 8GB NVIDIA Ge Force RTX 3070 GPU. |
| Software Dependencies | No | The paper mentions software like "Py Torch Python package" and "DGL library", but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all compared models, we performed hyper-parameter selection on learning rate {5e-3, 5e-4, 5e-5, 5e-6, 5e-7} and epoch size {100, 200, 300, 400, 500, 600}. For NWR-GAE, we selected the sample size q from {3, 5, 8, 10}, and the trade-off weight parameters λd, λs from {10, 1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5}. For all settings, we use Adam optimizer and backward propagation from Py Torch Python package, and a fix dimension size as same as the graph node feature size. |