Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Estimating cognitive biases with attention-aware inverse planning

Authors: Sounak Banerjee, Daphne Cornelisse, Deepak Gopinath, Emily Sumner, Jonathan DeCastro, Guy Rosman, Eugene Vinitsky, Mark K Ho

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct three experiments to validate the attention-aware inverse planning approach to behavior modeling. In Section 4.1 we show how it can be used to infer heuristics and attentional biases from behavior in a tabular setting. In Section 4.2, we provide evidence that, unlike our proposed approach, standard approaches like inverse reinforcement learning [Abbeel and Ng, 2004] are not guaranteed to recover attention-limited decision-making. Finally, in Section 4.3 we describe an approach to attention-aware inverse planning in a high-fidelity driving simulator using real-world scenarios.
Researcher Affiliation	Collaboration	1New York University 2Toyota Research Institute
Pseudocode	Yes	Algorithm 1 Behavioral Utility Calculation (Appendix D) and Algorithm 2 Inference Logic (Appendix E)
Open Source Code	Yes	Code for replicating the experiments discussed in this section is available at: https://github.com/sounakban/gpudrive-CoDec/tree/Neur IPS-2025.
Open Datasets	Yes	Finally, we develop an approach for inferring biases from synthetically generated behavior in real-world driving scenarios selected from the Waymo Open Motion Dataset [Ettinger et al., 2021], demonstrating the feasibility of scaling up attention-aware inverse planning to complex, naturalistic domains.
Dataset Splits	No	The paper describes generating synthetic data for inference (e.g., 'sampling 125 construals and trajectories across the 25 scenarios' and 'sampled 80 trajectories from 10 scenarios per agent'). However, it does not specify traditional training, validation, or test dataset splits for model training or evaluation in the context of supervised learning.
Hardware Specification	Yes	All simulations were run on an Ubuntu High Performance Computing cluster. A single RTX8000 GPU and 40GB of memory were allocated for the process. Finally, we performed inference over heuristic parameters on a laptop computer (with a Intel Core Ultra 7 165H CPU and 32GB of RAM).
Software Dependencies	No	We calculate these quantities exactly using dynamic programming in JAX [Bradbury et al., 2018] and then solve for λ using off-the-shelf optimizers [Virtanen et al., 2020]. The citations refer to specific software libraries but their explicit version numbers are not provided in the text.
Experiment Setup	Yes	For each scenario, we computed the behavioral utility, ˆV (s, πC), ... across N rollouts in the true state (algorithm pseudocode available in Appendix D). For example, if there were 15 vehicles in a scene, then 15 different construed models were each rolled out N times... (N=40 from Algorithm 1). a biased construal selection policy for the specific set of λ-values was first used to sample eight construals (with replacement) from each of the 10 scenes. We uniformly sampled 215 sets of λ-values for the three heuristics, within a reasonable range of values for each heuristic. Finally, we performed maximum likelihood estimation using Bayesian optimization [Nogueira, 2014] to recover the true λs...