VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs

Authors: Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin CUI, Muhan Zhang, Jure Leskovec

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across seven datasets show VQGRAPH can consistently outperform GNNs by 3.90% on average accuracy, while enjoying 828 faster inference speed. Also VQGRAPH outperforms MLPs and SOTA distillation method NOSMOG (Tian et al., 2023b) by 28.05% and 1.39% on average accuracy across datasets, respectively.
Researcher Affiliation Collaboration 1Peking University 2Ant Group 3Stanford University
Pseudocode No The paper describes methods in text and uses mathematical equations, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Our code is available at https://github.com/YangLing0818/VQGraph
Open Datasets Yes We use five widely used public benchmark datasets (Zhang et al., 2022b; Yang et al., 2021a) (Citeseer, Pubmed, Cora, A-computer, and A-photo), and two large OGB datasets (Hu et al., 2020a) (Arxiv and Products) to evaluate the proposed model.
Dataset Splits Yes For the tran setting, we train our models on the labeled graph G, along with the corresponding feature matrix XL and label vector Y L, before evaluating their performance on the unlabeled data XU and Y U. Soft labels, soft code assignments are generated for all nodes within the graph (i.e., ysoft v , r GNN v , r MLP v for v V). As for ind, we follow the methodology of prior work (Tian et al., 2023b) in randomly selecting 20% of the data for inductive evaluation. Specifically, we divide the unlabeled nodes VU into two separate yet non-overlapping subsets, observed and inductive (i.e., VU = VU obs VU ind), producing three distinct graphs, G = GL GU obs GU ind, wherein there are no shared nodes. ... we employ a test dataset V U ind, which contains 20% of the test data, and another dataset V U obs, containing the remaining 80% of the test data.
Hardware Specification No The paper discusses inference time and efficiency but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as "Python 3.8" or "PyTorch 1.9".
Experiment Setup Yes Table 12: Hyperparameters of VQGRAPH. This includes details like MLP layers, hidden dim, learning rate, weight decay, dropout, and factors for Lclass distill (α) and Lcode distill (β).