Predict then Propagate: Graph Neural Networks meet Personalized PageRank

Authors: Johannes Gasteiger, Aleksandar Bojchevski, Stephan Günnemann

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this model outperforms several recently proposed methods for semi-supervised classification in the most thorough study done so far for GCN-like models.
Researcher Affiliation Academia Johannes Gasteiger, Aleksandar Bojchevski & Stephan G unnemann Technical University of Munich, Germany {j.gasteiger,a.bojchevski,guennemann}@in.tum.de
Pseudocode No The paper does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Our implementation is available online. 1https://www.kdd.in.tum.de/ppnp
Open Datasets Yes We use four text-classification datasets for evaluation. CITESEER (Sen et al., 2008), CORA-ML (Mc Callum et al., 2000; Bojchevski & G unnemann, 2018) and PUBMED (Namata ets al., 2012) are citation graphs... In the MICROSOFT ACADEMIC graph (Shchur et al., 2018) edges represent coauthorship.
Dataset Splits Yes The data is first split into a visible and a test set. For the visible set 1500 nodes were sampled for the citation graphs and 5000 for MICROSOFT ACADEMIC. The test set contains all remaining nodes. We use three different label sets in each experiment: A training set of 20 nodes per class, an early stopping set of 500 nodes and either a validation or test set. The validation set contains the remaining nodes of the visible set. We use 20 random seeds for determining the splits. These seeds are drawn once and fixed across runs to facilitate comparisons. We use one set of seeds for the validation splits and a different set for the test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instances) used for running the experiments.
Software Dependencies No We used Tensor Flow (Mart ın Abadi et al., 2015) for all experiments except bootstrapped feature propagation.
Experiment Setup Yes To ensure a fair model comparison we used a neural network for PPNP that is structurally very similar to GCN and has the same number of parameters. We use two layers with h = 64 hidden units. We apply L2 regularization with λ = 0.005 on the weights of the first layer and use dropout with dropout rate d = 0.5 on both layers and the adjacency matrix. For APPNP, adjacency dropout is resampled for each power iteration step. For propagation we use the teleport probability α = 0.1 and K = 10 power iteration steps for APPNP. We use α = 0.2 on the MICROSOFT ACADEMIC graph due to its structural difference (see Figure 5 and its discussion). The combination of this shallow neural network with a comparatively high number of power iteration steps achieved the best results during hyperparameter optimization (see Appendix G). ... We use the Adam optimizer with a learning rate of l = 0.01 and cross-entropy loss for all models (Kingma & Ba, 2015). Weights are initialized as described in Glorot & Bengio (2010). The feature matrix is L1 normalized per row.