Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DrGNN: Deep Residual Graph Neural Network with Contrastive Learning

Authors: Lecheng Zheng, Dongqi Fu, Ross Maciejewski, Jingrui He

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical experiments on multiple real-world datasets demonstrate that Dr GNN outperforms state-of-the-art deep graph representation baseline algorithms. The code of our method is available at the Git Hub link: https://github.com/zhenglecheng/Dr GNN.
Researcher Affiliation Collaboration Lecheng Zheng EMAIL University of Illinois Urbana-Champaign Dongqi Fu EMAIL Meta AI Ross Maciejewski EMAIL Arizona State University Jingrui He EMAIL University of Illinois Urbana-Champaign
Pseudocode No The paper describes the methodology using mathematical equations (Eq. 1-5) and prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code of our method is available at the Git Hub link: https://github.com/zhenglecheng/Dr GNN.
Open Datasets Yes Datasets. Cora (Lu & Getoor, 2003) dataset is a citation network consisting of 5,429 edges and 2,708 scientific publications from 7 classes. The edge in the graph represents the citation of one paper by another. Cite Seer (Lu & Getoor, 2003) dataset consists of 3,327 scientific publications which could be categorized into 6 classes, and this citation network has 9,228 edges. Pub Med (Namata et al., 2012) is a citation network consisting of 88,651 edges and 19,717 scientific publications from 3 classes. Reddit (Hamilton et al., 2017b) dataset is extracted from Reddit posts, which consists of 4,584 nodes and 19,460 edges. Notice that we follow the splitting strategy used in (Zhao & Akoglu, 2020) by randomly sampling 3% of the nodes as the training samples, 10% of the nodes as the validation samples, and the remaining 87% as the test samples. Moreover, we follow the OGB benchmark (Hu et al., 2020) for the large-scale dataset OGB-ar Xiv (Wang et al., 2020) which is a citation network and consists of 1,166,243 edges and 169,343 nodes from 40 classes. Also, we adopt the non-homophilous benchmark (Lim et al., 2021) for the heterophilous version of OGB-ar Xiv, which is denoted as ar Xiv-year and the edge homophily is just 0.222.
Dataset Splits Yes Notice that we follow the splitting strategy used in (Zhao & Akoglu, 2020) by randomly sampling 3% of the nodes as the training samples, 10% of the nodes as the validation samples, and the remaining 87% as the test samples. Moreover, we follow the OGB benchmark (Hu et al., 2020) for the large-scale dataset OGB-ar Xiv (Wang et al., 2020) which is a citation network and consists of 1,166,243 edges and 169,343 nodes from 40 classes.
Hardware Specification Yes The experiments are performed on a Windows machine with a 16GB RTX 5000 GPU.
Software Dependencies No The paper mentions optimizers (RMSProp, ADAGRAD) and uses libraries like Scikit-learn, but does not provide specific version numbers for any software dependencies used for the main implementation or experiments.
Experiment Setup Yes For a fair comparison, we set the dropout rate to 0.5, the weight decay rate to 0.0005, and the total number of iterations to 1500 for all baseline methods in Table 1 and Table 3; if not specialized, GCN is chosen as the backbone, and the dimension of each layer is set to 50 for all the graph neural network baseline methods. Moreover, we set the learning rate to be 0.001 and the optimizer is RMSProp, which is one variant of ADAGRAD (Duchi et al., 2011). Table 6: Hyperparameters for Dr GNN shown in Table 3. Method Dr GNN Cora λ = 20, α = 0.03 Cite Seer λ = 10, α = 0.02 Pub Med λ = 18, α = 0.1 Reddit λ = 20, α = 0.02