Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SPARTAN: A Sparse Transformer World Model Attending to What Matters

Authors: Anson Lei, Bernhard Schölkopf, Ingmar Posner

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate SPARTAN against the current state-of-the-art in object-centric world models in observation-based environments and demonstrate that our model can learn local causal graphs that accurately reflect the underlying interactions between objects, achieving significantly improved few-shot adaptation to dynamics changes, as well as robustness against distractors.
Researcher Affiliation	Academia	Anson Lei Applied AI Lab University of Oxford, UK EMAIL Bernhard Schölkopf MPI for Intelligent Systems Tübingen, Germany EMAIL Ingmar Posner Applied AI Lab University of Oxford, UK EMAIL
Pseudocode	No	The paper describes the methodology using mathematical equations and textual explanations, but it does not include any clearly labeled pseudocode or algorithm blocks. For instance, Section 3.1 'Learnable Sparse Connections' explains the process with equations like (3), (4), and (5), but it's not formatted as an algorithm.
Open Source Code	Yes	The code for reproducing the main results are included in the supplementary materials. We plan to open source our code upon acceptance.
Open Datasets	Yes	Datasets We evaluate our model in three domains: Interventional Pong, CREATE, and Traffic. The Interventional Pong dataset [27], a standard benchmark for causal representation learning, is based on the Pong game with interventions. The CREATE [16] environment is a 2D physics simulation that consists of interacting objects such as ladders, cannons, and balls. For the Traffic domain, we use the Waymo Open Dataset [44] which is collected in real life.
Dataset Splits	No	The paper mentions training on 'observation sequences sampled from different environments' and adaptation to 'a sample of five trajectories from an intervened environment'. However, it does not provide specific percentages, absolute counts, or references to predefined splits for training, validation, and testing sets needed for reproduction.
Hardware Specification	Yes	For the experiments on the simulated datasets, all models are trained on single GPUs (mixture of Nvidia V100 and RTX 6000) and converge within 3 days. ... In the traffic domain, the models are trained in parallel on 4 GPUs due to the size of each scene (roughly 1000 tokens). Training takes less than one week for the baseline MTR model and under two weeks for SPARTAN.
Software Dependencies	No	The paper mentions 'Optimiser Adam' and 'Adam[20]' but does not provide specific version numbers for software libraries or environments like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The hyperparameters for SPARTAN and the baselines are shown in table 3 and 4. ... Table 3: Hyperparameters for SPARTAN and the Transformer Baseline. Hyperparameter Interventional Pong CREATE Token Dimension 32 64 Embedding Dimension 512 512 n. transformer layers 3 3 MLP hidden dimension 512 1024 n. MLP layers per transformer layer 3 3 lr 5e-5 5e-5 Optimiser Adam[20] Adam ... Table 4: Hyperparameters for the Global Graph Baseline. Hyperparameter Interventional Pong CREATE Token Dimension 32 64 MLP hidden dimension 1024 1024 n. MLP layers 5 5 lr 5e-5 5e-5 Optimiser Adam Adam