reproducibilityindex.ai

Rethinking the Power of Graph Canonization in Graph Representation Learning with Stability

Authors: Zehao Dong, Muhan Zhang, Philip Payne, Michael A Province, Carlos Cruchaga, Tianyu Zhao, Fuhai Li, Yixin Chen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A comprehensive set of experiments demonstrates the effectiveness of the proposed method. In many popular graph benchmark datasets, graph canonization successfully enhances GNNs and provides highly competitive performance, indicating the capability and great potential of proposed method in general graph representation learning. In graph datasets where the sufficient condition holds, GNNs enhanced by universal graph canonization consistently outperform GNN baselines and successfully improve the SOTA performance up to 31%, providing the optimal solution to numerous challenging real-world graph analytical tasks like gene network representation learning in bioinformatics.
Researcher Affiliation	Academia	Zehao Dong1 Muhan Zhang2 Philip R.O. Payne3 Michael A Province4 Carlos Cruchaga5 Tianyu Zhao6 Fuhai Li3,7 Yixin Chen1, {zehao.dong,prpayne,mprovince,cruchagac,tzhao}@wustl.edu chen@cse.wustl.edu & muhan@pku.edu.cn 1 Department of Computer Science & Engineering, Washington University in St. Louis. 2 Institute for Artificial Intelligence, Peking University. 3 Institute for Informatics, Data Science, and Biostatistics, Washington University School of Medicine 4 Department of Genetics, Washington University School of Medicine 5 Department of Psychiatry, Washington University School of Medicine 6 Department of Radiation Oncology, Washington University School of Medicine 7 Department of Pediatrics, Washington University School of Medicine
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	Our source code is available at https://github.com/zehao-dong/Rethink Graph Canonical.
Open Datasets	Yes	Synthetic datasets and TU datasets. EXP (Abboud et al., 2020) and CSL (Murphy et al., 2019) are two graph isomorphism test datasets... TU datasets (Dobson & Doig, 2003; Toivonen et al., 2003) include five graph datasets: PROTEINS, PTC-MR, MUTAG, ENZYMES, D&D. ...Mayo and Ros Map are designed for the Alzheimer s disease (AD) classification (De Jager et al., 2018; Allen et al., 2016); Cancer is designed for the cancer subtype classification. ...Brain graphs in the public dataset f MRI ABIDE Di Martino et al. (2014)... In addition, we need to point out that ESAN with EGO and EGO+ policy can be considered as a subgraph-based GNN. However, when the node-deleted subgraphs (ND) and the edge-deleted subgraphs (ED) polices are used, a graph is mapped to the set of subgraphs obtained by removing a single node or edge, then ESAN can be understood through the marking prism. In our experiments, ESAN is equipped with ED policy.
Dataset Splits	Yes	We perform 10-fold (or 5-fold) cross validation for robust comparison.
Hardware Specification	Yes	All experiments are implemented in the environment of Py Torch using NVIDIA A40 GPUs.
Software Dependencies	No	The paper only mentions 'Py Torch' without specifying a version number for it or any other software components, which is required for reproducibility.
Experiment Setup	Yes	Our graph-canonization-based GNNs take backbone GNNs from {GIN, GCN, Graph SAGE, GAT}. In GNN baselines, the embedding dimension of graph convolution layer is set to be 32. The number of graph convolution layer is selected from the set {2, 3, 4}. The graph-level readout function is selected from {mean, sum, sortpool}. In NGNN, we use height-1 rooted subgraphs to avoid the out-of-memory problem in gene network datasets. The experimental settings follow Dong et al. (2022b) on dataset NA/BN and follow Zhang & Li (2021) on TU datasets. The training protocols is composed of the selection of the evaluation rates and training stop rules. Specifically, the learning rate of optimizer picks the best from the set {1e 4, 1e 3, 1e 2}; the training process is stopped when the validation metric does not improve further under a patience of 10 epochs.