reproducibilityindex.ai

Graph Auto-Encoder via Neighborhood Wasserstein Reconstruction

Authors: Mingyue Tang, Pan Li, Carl Yang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both synthetic and real-world network datasets show that the unsupervised node representations learned with NWR have much more advantageous in structure-oriented graph mining tasks, while also achieving competitive performance in proximity-oriented ones.We design our experiments to evaluate NWR-GAE, focusing on the following research questions: RQ1: How does NWR-GAE perform on structure-role-based synthetic datasets in comparison to state-of-the-art unsupervised graph embedding baselines? RQ2: How do NWR-GAE and its ablations compare to the baselines on different types of real-world graph datasets? RQ3: What are the impacts of the major model parameters including embedding size d and sampling size q on NWR-GAE?
Researcher Affiliation	Academia	Mingyue Tang1 , Carl Yang2 , Pan Li3 1Department of Engineering Systems and Environment, University of Virginia 2Department of Computer Science, Emory University 3Department of Computer Science, Purdue University
Pseudocode	No	The paper does not contain a clearly labeled section or block for "Pseudocode" or "Algorithm".
Open Source Code	Yes	1Code available at https://github.com/mtang724/NWR-GAE.
Open Datasets	Yes	We use a total of nine public real-world graph datasets, which roughly belong to three types, one with proximity-oriented (assortative (Liu et al., 2020)) labels, one with structure-oriented (disassortative) labels, and one with proximity-structure-mixed labels. Among them, Cora, Citeseer, Pubmed are publication networks (Namata et al., 2012); Cornell, Texas, and Wisconsin are school department webpage networks (Pei et al., 2020); Chameleon, Squirrel are page-page networks in Wikipedia (Rozemberczki et al., 2021); Actor is an actor co-ﬁlming network (Tang et al., 2009).
Dataset Splits	Yes	To be consistent across all datasets in our experiments, we did not follow the standard semi-supervised setting (20 labels per class for training) on Cora, Citeseer and Pubmed, but rather randomly split all datasets with 60% training set, 20% validation set, and 20% testing set, which is a common practice on Web KB and Wikipedia network datasets (i.e. Cornell, Texas, Chameleon, etc.) (Liu et al., 2020; Pei et 2020; Ma et al., 2021).
Hardware Specification	Yes	Most experiments are performed on a 8GB NVIDIA Ge Force RTX 3070 GPU.
Software Dependencies	No	The paper mentions software like "Py Torch Python package" and "DGL library", but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For all compared models, we performed hyper-parameter selection on learning rate {5e-3, 5e-4, 5e-5, 5e-6, 5e-7} and epoch size {100, 200, 300, 400, 500, 600}. For NWR-GAE, we selected the sample size q from {3, 5, 8, 10}, and the trade-off weight parameters λd, λs from {10, 1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5}. For all settings, we use Adam optimizer and backward propagation from Py Torch Python package, and a ﬁx dimension size as same as the graph node feature size.