Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation

Authors: Wenxuan Bao, Zhichen Zeng, Zhining Liu, Hanghang Tong, Jingrui He

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on synthetic and real-world datasets to evaluate our proposed Matcha from the following aspects: RQ1: How can Matcha empower TTA algorithms and handle various structure shifts on graphs? RQ2: To what extent can Matcha restore the representation quality better than other methods?
Researcher Affiliation	Academia	1University of Illinois Urbana-Champaign EMAIL
Pseudocode	Yes	Algorithm 1 Matcha
Open Source Code	Yes	Our code is available at https://github.com/baowenxuan/Matcha.
Open Datasets	Yes	We first adopt CSBM (Deshpande et al., 2018) to generate synthetic graphs with controlled structure and attribute shifts. ... For real-world datasets, we adopt Syn-Cora (Zhu et al., 2020), Syn-Products (Zhu et al., 2020), Twitch-E (Rozemberczki et al., 2021), and OGB-Arxiv (Hu et al., 2020).
Dataset Splits	Yes	We use non-overlapping train-test split over nodes on Syn-Cora to avoid label leakage. ... For OGB-Arxiv, we use a subgraph consisting of papers from 1950 to 2011 as the source graph, 2011 to 2014 as the validation graph, and 2014 to 2020 as the target graph.
Hardware Specification	Yes	We use single Nvidia Tesla V100 with 32GB memory. However, for the majority of our experiments, the memory usage should not exceed 8GB. We switch to Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz when recording the computation time.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup	Yes	For CSBM, Syn-Cora, Syn-Products, we use GPRGNN with K = 9. The featurizer is a linear layer, followed by a batchnorm layer, and then the GPR module. The classifier is a linear layer. The dimension for representation is 32. For Twitch-E and OGB-Arxiv, we use GPRGNN with K = 5. The dimension for representation is 8 and 128, respectively.