Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DUALFormer: Dual Graph Transformer

Authors: Zhuo Jiaming, Yuwei Liu, Yintong Lu, Ziyi Ma, Kun Fu, Chuan Wang, Yuanfang Guo, Zhen Wang, Xiaochun Cao, Liang Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on eleven real-world datasets demonstrate its effectiveness and efficiency over existing state-of-the-art GTs. ... Section 4: EXPERIMENTS ... Node Classification. The results of the models on the node classification task are presented in Tab. 2 and Fig. 2. ... Node Property Prediction. This experiment is designed to evaluate the scalability and effectiveness of the models by testing them on four large-scale graph datasets. ... Scalability Study. This experiment aims to thoroughly examine the scalability of DUALFormer. ... Backbone Performance Evaluation. This experiment is designed to evaluate the impact of different backbone GNNs...
Researcher Affiliation	Academia	1Hebei Province Key Laboratory of Big Data Calculation, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China 2School of Computer Science and Technology, Beijing Jiao Tong University, Beijing, China 3School of Computer Science and Engineering, Beihang University, Beijing, China 4School of Artificial Intelligence, OPtics and Electro Nics (i OPEN), School of Cybersecurity, Northwestern Polytechnical University, Xi an, China 5School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
Pseudocode	Yes	A ALGORITHM DESCRIPTION The specifics of the DUALFormer architecture are presented in Algorithm 1, whereas the innovative global attention module is elaborated in Algorithm 2. Algorithm 1: Py Torch-style Code for DUALFormer Algorithm 2: Py Torch-style Code for Global Attention Layer
Open Source Code	No	The paper does not provide explicit open-source code for the DUALFormer methodology, only for the baselines used in the comparison. There is no specific repository link or a clear affirmative statement of code release for the authors' own work.
Open Datasets	Yes	Datasets. In the node classification experiments, seven benchmark datasets are employed, including Cora Sen et al. (2008), Cite Seer Sen et al. (2008), Pub Med Sen et al. (2008), Computers Shchur et al. (2018), Photo Shchur et al. (2018), CS Shchur et al. (2018), and Physics Shchur et al. (2018). For the node property prediction, four benchmark datasets are utilized, including ogbn-proteins Hu et al. (2020), ogbn-arxiv Hu et al. (2020), ogbn-products Hu et al. (2020), and pokec Jure (2014). For details on the utilized datasets, refer to Section D.1.
Dataset Splits	Yes	Dataset Splitting. To ensure the credibility and reproducibility of the experiment, the dataset splittings follow widely accepted schemes. To be specific, the split for the Cora, Cite Seer, and Pub Med datasets follows the standard public strategies referenced in Kipf & Welling (2016), which allocate 20 nodes per class for training, 500 nodes for validation, and 1000 nodes for testing purposes. For the Computers, Photo, CS, and Physics datasets, the training, validation, and testing sets constitute 60%, 20%, and 20% of the data, respectively. For the four datasets sourced from OGB (i.e., ogbnarxiv, ogbn-products, ogbn-proteins, and ogbn-papers100M), their standard public splits are utilized as referenced in Hu et al. (2020). For the pokec dataset, the partitioning follows Deng et al. (2024), with the training, validation, and testing sets distributed as 50%, 25%, and 25%, respectively. For the Roman-Empire and Questions datasets, the partitioning follows the scheme from Platonov et al. (2023), specifically, a 50%/25%/25% split for training, validation, and testing.
Hardware Specification	Yes	Table 6: Servers and environment. Server 1 OS Linux 5.15.0-82-generic CPU Intel(R) Core(TM) i7-12700K CPU @ 3.6GHz GPU Ge Force RTX 4090 Server 2 OS Linux 5.15.0-78-generic CPU Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz GPU NVIDIA A800 80GB PCIe
Software Dependencies	No	The paper does not explicitly provide specific software versions for the implementation of DUALFormer. While it mentions 'Py Torch-style Code' and 'Adam optimizer', it lacks specific version numbers for PyTorch, Python, or other key libraries used for their method.
Experiment Setup	Yes	Hyper-parameters. The model employs a semi-supervised learning framework, in which the model performances on the validation set are leveraged to tune the hyperparameter selection. By default, the hyper-parameters are meticulously selected via a grid search strategy. For the node classification task, DUALFormer is trained utilizing an Adam optimizer with the learning rate from {1e 3, 1e 2} and the weight decay rate from {1e 5, 1e 4, 1e 3}. The number of the local graph convolution layers and global attention layers are chosen from {1, 2, 3, 4, 5, 6, 7}, and the optimal results corresponding to these selections are depicted in Fig. 6. The dimensions of hidden layers are selected from {32, 64, 128, 256} and the impacts are analyzed in Section 4.2. For each layer, the dropout rate is among the set {0.1, 0.3, 0.5}. For the parameters unique to the attention layer: the number of heads is chosen from {2, 4}, and α of residual connection is selected from {0.1, 0.3, 0.5}. In addition, Batch Normalization and Layer Normalization are utilized as appropriate. For the node property prediction task on four large-scale graphs, a mini-batch training strategy is adopted. The value of α for the residual connection is chosen from the set {0.1, 0.2, 0.3, 0.4, 0.5}. The selection of hyper-parameters follows the compared baselines Chen et al. (2022); Wu et al. (2024); Deng et al. (2024). Refer to Tab. 5 for the parameter selections associated with the respective outcomes.