Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

What do you know? Bayesian knowledge inference for navigating agents

Authors: Matthias Schultheis, Jana-Sophie Schönfeld, Constantin A Rothkopf, Heinz Koeppl

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical validation using simulated behavioral data and human data from an online experiment demonstrates that our model effectively captures human navigation under uncertainty and reveals interpretable insights into their environmental knowledge.
Researcher Affiliation	Academia	Matthias Schultheis Center for Cognitive Science Technische Universität Darmstadt Darmstadt, Germany EMAIL
Pseudocode	Yes	Algorithm 1 Bayesian knowledge inference Output: Samples {k1, . . . , k L} from the distribution P(K \| D) Input: Number of samples L, Partial world model W, Trajectory D
Open Source Code	Yes	An implementation of our algorithm is publicly available under the MIT Licence1. For the planning algorithm, we adapted an implementation2 of D* lite, also available under the MIT License.
Open Datasets	Yes	To validate our method on human behavioral data, we developed a browser game that allowed human subjects to navigate through these partially occluded grid worlds. We conducted an online experiment with 52 participants recruited via the Prolific platform. The experiment was approved by the local institutional review board (IRB). Detailed information about the grid worlds and the experiment design is provided in Section A.4. [...] Both the code and dataset are publicly available.
Dataset Splits	No	The paper does not explicitly mention training/test/validation splits for its datasets in the traditional sense. It evaluates models on simulated data and human data from an online experiment (trajectories from 52 participants). Each trajectory is used for evaluation directly.
Hardware Specification	Yes	Computing resources used: Intel Xeon Platinum 9242 Processor (1 core per trajectory)
Software Dependencies	No	The paper mentions adapting an implementation of D* lite and refers to an A* algorithm package (pypi.org/project/astar/) but does not provide specific version numbers for these or other key software components used in the experiments.
Experiment Setup	Yes	Throughout the experiments, we used the following hyperparameters: Total number of samples for the inference: 1000 Number of samples rejected for burnin phase: 100 Parameter for the Ising model (spatial prior): β = 0.2 Cost for transitions from/to known fields: 1 Belief of traversability for unknown fields: q = 0.5 Softmax temperature for our proposed planning model: 1 Softmax temperature for the optimistic planning model: 0.8