Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Graph Neural Network-Inspired Kernels for Gaussian Processes in Semi-Supervised Learning
Authors: Zehao Niu, Mihai Anitescu, Jie Chen
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we introduce this inductive bias into GPs to improve their predictive performance for graph-structured data. We show that a prominent example of GNNs, the graph convolutional network, is equivalent to some GP when its layers are infinitely wide; and we analyze the kernel universality and the limiting behavior in depth. We further present a programmable procedure to compose covariance kernels inspired by this equivalence and derive example kernels corresponding to several interesting members of the GNN family. We also propose a computationally efficient approximation of the covariance matrix for scalable posterior inference with large-scale data. We demonstrate that these graph-based kernels lead to competitive classification and regression performance, as well as advantages in computation time, compared with the respective GNNs. 6 EXPERIMENTS In this section, we conduct a comprehensive set of experiments to evaluate the performance of the GP kernels derived by taking limits on the layer width of GCN and other GNNs. |
| Researcher Affiliation | Collaboration | Zehao Niu1, Mihai Anitescu1,2 1University of Chicago, 2Argonne National Laboratory EMAIL EMAIL Jie Chen MIT-IBM Watson AI Lab IBM Research EMAIL |
| Pseudocode | Yes | Algorithm 1 Computing K(L) b K(L) = Q(L)Q(L)T |
| Open Source Code | Yes | Code is available at https://github.com/niuzehao/gnn-gp. |
| Open Datasets | Yes | The datasets Cora/Citeseer/Pub Med/Reddit, with predefined training/validation/test splits, are downloaded from the Py Torch Geometric library (Fey & Lenssen, 2019) and used as is. The dataset Ar Xiv comes from the Open Graph Benchmark (Hu et al., 2020b). The datasets Chameleon/Squirrel/Crocodile come from MUSAE (Rozemberczki et al., 2021). |
| Dataset Splits | Yes | The datasets Cora/Citeseer/Pub Med/Reddit, with predefined training/validation/test splits, are downloaded from the Py Torch Geometric library (Fey & Lenssen, 2019) and used as is. The training/validation/test splits of the former two sets of datasets come from Geom GCN (Pei et al., 2020), in accordance with the Py Torch Geometric library. The split for Crocodile is not available, so we conduct a random split with the same 0.48/0.32/0.20 proportion as that used for Chameleon and Squirrel (Rozemberczki et al., 2021). |
| Hardware Specification | Yes | All experiments are conducted on a Nvidia Quadro GV100 GPU with 32GB of HBM2 memory. |
| Software Dependencies | Yes | The code is written in Python 3.10.4 as distributed with Ubuntu 22.04 LTS. We use Py Torch 1.11.0 and Py Torch Geometric 2.1.0 with CUDA 11.3. |
| Experiment Setup | Yes | For classification tasks in Table 3, the hyperparameters are set to σb = 0.0, σw = 1.0, L = 2, hidden = 256, and dropput = 0.5. GCN is trained with learning rate 0.01. For regression tasks, they are set to σb = 0.1, σw = 1.0, L = 2, hidden = 256, and dropput = 0.5. GCN is trained with learning rate 0.01. |