Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Bayesian Goal Inference for Boundedly Rational Planning Agents
Authors: Tan Zhi-Xuan, Jordyn Mann, Tom Silver, Josh Tenenbaum, Vikash Mansinghka
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards. |
| Researcher Affiliation | Academia | Tan Zhi-Xuan, Jordyn L. Mann, Tom Silver Joshua B. Tenenbaum, Vikash K. Mansinghka Massachusetts Institute of Technology EMAIL |
| Pseudocode | Yes | Figure 3 (b) shows 'Boundedly-rational agent programs' with pseudocode for UPDATE-PLAN and SELECT-ACTION. Algorithm 1 is titled 'Sequential Inverse Plan Search (SIPS) for online Bayesian goal inference'. |
| Open Source Code | Yes | Code for the architecture and experiments presented in this paper is available at https://github. com/ztangent/Plinf.jl/tree/neurips-2020-experiments, as part of the Plinf.jl package for Bayesian inverse planning. |
| Open Datasets | Yes | We validate our approach on domains with varying degrees of complexity... Taxi (|G| = 3, |S| = 125): A benchmark domain used in hierarchical reinforcement learning [42]... |
| Dataset Splits | No | The paper states it runs 'each inference method on a dataset of optimal and non-optimal agent trajectories for each domain' and then evaluates accuracy, but does not specify explicit train/validation splits or percentages for these trajectories. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Gen, a general-purpose probabilistic programming system' and 'Planning Domain Definition Language (PDDL)' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Non-optimal trajectories were generated using the replanning agent model... with parameters r=2, q=0.95, γ=0.1. ...SIPS achieved good performance with 10 particles per goal... For baselines, we used a discount factor of 0.9, and Boltzmann noise parameter α=1. |