Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Structure of Relation Decoding Linear Operators in Large Language Models

Authors: Miranda Anna Christ, Adrián Csiszárik, Gergely Becsó, Dániel Varga

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with tensor network models explore this hypothesis by seeking compact representations that preserve decoding capacity while exploiting the possible latent structure of relational knowledge. In these experiments we evaluate the compression capabilities of both Simple Order3Network and Triangle Tensor Network models. We train these models on the dataset of Hernandez et al. [2023] until convergence and measure the faithfulness of the resulting decoder functions. Figure 1a summarizes the results by plotting the mean faithfulness against the parameter count of trained tensor network matrices.
Researcher Affiliation	Academia	Miranda Anna Christ 1,2, Adrián Csiszárik 2,3, Gergely Becsó2,3, Dániel Varga2 1Fazekas Mihály High School, Budapest, Hungary 2HUN-REN Alfréd Rényi Insititute of Mathematics, Budapest, Hungary 3Eötvös Loránd University, Budapest, Hungary EMAIL, EMAIL
Pseudocode	No	The paper describes mathematical formulations for tensor network architectures, such as: "T R s,o = Ts,r,ovr = X r,s ,r ,o vr P 2 r,r T 0 s ,r ,o P 1 s,s P 2 o,o". However, there are no explicit sections or figures labeled "Pseudocode" or "Algorithm" that provide structured, step-by-step procedures in a code-like format.
Open Source Code	Yes	1Code and data are available at the project website: https://bit.ly/structure-of-relations.
Open Datasets	Yes	We use three datasets: 1) The Dataset of Hernandez et al. [2023]: it consists of 47 distinct, mostly orthogonal relations (i.e., fruit inside color and adjective antonym). 2) Extended Dataset: our extended version of the dataset of Hernandez et al. [2023] that introduces several new relations, allowing a better understanding of the relational structure. 3) Mathematical Dataset: a novel relational dataset containing mathematical operations (i.e., number plus 6 and number times 9) providing a more controlled, and in a sense a denser relational structure. For further information, we refer to Appendix D. We release the extended and the mathematical dataset under the MIT license.
Dataset Splits	Yes	Figure 4b: Faithfulness results for the mathematics dataset. Blue bars represent relations from the training set, purple bars from the test set split randomly with a ratio 75%-25% respectively. Figure 7: Sample-wise faithfulness results with tensor networks on the dataset of Hernandez et al. [2023]. All bars represent the test set for a given relation, after splitting all samples with a train-test ratio of 75%-25% respectively.
Hardware Specification	Yes	All experiments were run on an internal cluster of either Nvidia A100 40GB or Nvidia H100 NVL GPUs. All conducted experiments required cca. 5000 GPU hours.
Software Dependencies	No	The paper mentions specific language models used (GPT-J, Llama 3.1 8B, GPT-Neo X-20B) and discusses optimizers (SGD, Adam, Adam W), but it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA, which are necessary for a reproducible software environment.
Experiment Setup	Yes	We performed a grid search for both models with dr {2, 4, 6, 8, 30, 100}, ds , do {10, 50, 100, 300}; for the Triangle Tensor Network we fixed dx, dy, dz {50}. We present all hyperparameters and their respective values in Table 4. We conducted a grid search using these values and selected the optimal optimizer, batch size, and learning rate indicated under the "Selected value" column to generate all figures in the paper.