Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FoGE: Fock Space inspired encoding for graph prompting

Authors: Takis Chytas, Rudrasis Chakraborty, Vikas Singh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine our Fock-space based encoding in two separate settings: (a) as a stand-alone input of a simple model, and (b) as an extra prefix in a frozen LLM (Fo GE-LLM), for graph prompting. Our initial experiments ( 4.1) show that simple models can process Fo GE embeddings quite well for traditional tasks. In 4.2, we present that Fo GE can be successfully combined with an LLM, leading to two advantages over stand-alone models: (1) language interface for flexible graph queries without pre-defining task types, and (2) easier integration with mature software ecosystems built around LLMs that reduce deployment overhead.
Researcher Affiliation	Collaboration	Sotirios Panagiotis Chytas1 Rudrasis Chakraborty2 Vikas Singh1 1University of Wisconsin-Madison 2Lawrence Livermore National Lab EMAIL, EMAIL EMAIL
Pseudocode	No	The paper describes the methodology conceptually and provides mathematical formulations, but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We provide code for grounding LLMs using our graph encodings as prompts and profile the performance of this pipeline relative to baselines, on diverse datasets. Our open-source code offers a scalable way to train Fo GE-LLM even on consumer GPUs, by using FSDP [52]. For reference, Graph Token [16] is trained on TPUs (code unavailable) whereas Graph LLM [17] has a large memory/compute footprint (trained on A100 80GB). The code can be found in https://github.com/SPChytas/Fo GE
Open Datasets	Yes	Datasets and Models. We performed experiments on multiple graph reasoning datasets: from simple graph-understanding tasks to hypergraphs and proteins and aim to cover different aspects of graph understanding/reasoning. Specifically, we consider the 7 following datasets/dataset collections: (i) Graph QA [15] (ii) Graph Reasoning [17] (iii) Hyper Graph QA (iv) PPI [53] (v) OBNB [54] (vi) mol-HIV [55] (vii) Sab Dab [56]. More details about the datasets can be found in the appendix.
Dataset Splits	Yes	8. mol-HIV [55]: The ogbg-mol HIV dataset consists of molecular graphs (atoms as nodes, chemical bonds as edges) labeled for the binary classification task of predicting whether a molecule inhibits HIV replication or not. Each molecule is represented with 9-dimensional atom features (e.g. atomic number, chirality, ring membership, formal charge), and the dataset is evaluated using scaffold splits with ROC-AUC as the metric. 3. Hyper Graph QA: ... The training dataset consists of only 2000 instances, making it hard for large models to avoid overfitting.
Hardware Specification	No	Our open-source code offers a scalable way to train Fo GE-LLM even on consumer GPUs, by using FSDP [52]. For reference, Graph Token [16] is trained on TPUs (code unavailable) whereas Graph LLM [17] has a large memory/compute footprint (trained on A100 80GB). This implementation allows the user to train this, or any similar, model to conventional GPUs with less memory while, at the same time, speed up the process by preloading all the obtained lightweight graph embeddings to the GPUs. In Table 10, we show the runtime of Fo GE as we increase the number of edges, on a conventional consumer CPU.
Software Dependencies	No	Our implementation is based on Pytorch Lightning [87], which allows us to split and train the model on multiple GPUs using FSDP.
Experiment Setup	Yes	We train the LLM-based construction with a batch size of 16 and a learning rate of 1e-3. The model required less than 10 epochs to convergence, in contrast to other works that require more training time due to the ellaborate graph encoders (e.g., [17]). Our implementation is based on Pytorch Lightning [87], which allows us to split and train the model on multiple GPUs using FSDP. This implementation allows the user to train this, or any similar, model to conventional GPUs with less memory while, at the same time, speed up the process by preloading all the obtained lightweight graph embeddings to the GPUs. The merging of the graph embedding with the LLM is based on the idea of prefix tuning [18], i.e., pre-append the embedding to the input text embeddings and, in our case, this is happening with the use of a linear adapter. We experimented both with a single linear adapter on the input layer, as well as a linear adapter per layer and the difference was only marginal in the final results. We adjust vector dimensionality from 512 to 2048 and use just a single adapter for the entire model or one adapter per layer in Fo GE-LLM.