Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Higher-Order Learning with Graph Neural Networks via Hypergraph Encodings

Authors: Raphaël Pellegrin, Lukas Fesser, Melanie Weber

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analysis shows that hypergraph-level encodings provably increase the representational power of message-passing graph neural networks beyond that of their graph-level counterparts. For complete reproducibility, we release our codebase: https://github.com/Weber-GeoML/Hypergraph_Encodings. ... Our experimental results benchmark the new encodings in combination with graphand hypergraph-level message-passing as well as transformer-based architectures. ... Throughout all of our experiments, we treat the computation of encodings as a preprocessing step, which is first applied to all graphs in the data sets considered. We then train a GNN on a part of the preprocessed graphs and evaluate its performance on a withheld set of test graphs (nodes in the case of node classification). ... We report the mean test accuracy, along with the 95% confidence interval for the node classification tasks on hypergraph datasets in Tab. 1 and for tasks on graph datasets in Tab. 2, 3, 10, 11, 12, and 13.
Researcher Affiliation	Academia	Raphael Pellegrin Independent Researcher EMAIL Lukas Fesser Harvard University EMAIL Melanie Weber Harvard University EMAIL
Pseudocode	No	The paper describes update functions for various architectures using mathematical notation (e.g., "xl+1 v = ϕl M p Nv ψl xl p"). In Appendix B, "Table 4: Overview of Architectures" lists "Update Function" in formulaic form. There are no explicit sections or figures titled "Pseudocode" or "Algorithm".
Open Source Code	Yes	For complete reproducibility, we release our codebase: https://github.com/Weber-GeoML/Hypergraph_Encodings.
Open Datasets	Yes	We consider multiple datasets commonly used for benchmarking in the literature, including social networks, chemical reaction networks, and citation networks. ... Collab, Imdb and Reddit are proposed in (Yanardag & Vishwanathan, 2015). ... Mutag is a collection of graphs corresponding to nitroaromatic compounds (Debnath et al., 1991). ... Proteins and Enzymes are introduced in (Borgwardt et al., 2005). ... Peptides is a chemical data set introduced in (Dwivedi et al., 2022). ... We use five datasets that are naturally parametrized as hypergraphs: pubmed, Cora co-authorship (Cora-CA), cora co-citation (Cora-CC), Citeseer (Sen et al., 2008) and DBLP (Rossi & Ahmed, 2015). We use the same pre-processed hypergraphs as in Yadati et al. (2019), which are taken from Huang & Yang (2021).
Dataset Splits	Yes	For this semi-supervised hypernode classification task, each dataset is split so that a small fraction of labeled nodes is used for training (with label rates ranging from 0.8% to 5.2%, depending on the dataset see Tab. 21), and the rest are used for testing and validation. Table 21: Train/Test split proportions in the hypergraph datasets. Cora-CA 2,708 140 (5.2%) 2,568 (94.8%) 7 Citeseer 3,312 138 (4.2%) 3,174 (95.8%) 6 Cora-CC 2,708 140 (5.2%) 2,568 (94.8%) 7 Pubmed 19,717 78 (0.4%) 19,639 (99.6%) 3
Hardware Specification	Yes	Our experiments were conducted on a local server with the specifications presented in Tab. J. Table 39: Server specifications. Components Specifications Architecture X86_64 OS UBUNTU 20.04.5 LTS x86_64 CPU AMD EPYC 7742 64-CORE GPU NVIDIA A100 TENSOR CORE RAM 40GB
Software Dependencies	No	All experiments in this paper were implemented in Python using Py Torch, Numpy Py Torch Geometric, and Python Optimal Transport.
Experiment Setup	Yes	Throughout all of our experiments, we treat the computation of encodings as a preprocessing step, which is first applied to all graphs in the data sets considered. We then train a GNN on a part of the preprocessed graphs and evaluate its performance on a withheld set of test graphs (nodes in the case of node classification). Settings and optimization hyperparameters are held constant across baseline models for all encodings, so that hyperparameter tuning can be ruled out as a source of performance gain. We obtain the settings for the individual encoding types via hyperparameter tuning. For all preprocessing methods and hyperparameter choices, we record the test set performance of the settings with the best validation performance. ... We outline the hyperparameter used for Tab. 1, Tab. 2, Tab. 10, and Tab. 13 in Tab. 7, Tab. 8, Tab. 9. [And then provides these tables with specific values for Num. Layers, Hidden Dim, Learning Rate, Dropout, Batch Size, Epochs].