reproducibilityindex.ai

Online Bayesian Goal Inference for Boundedly Rational Planning Agents

Authors: Tan Zhi-Xuan, Jordyn Mann, Tom Silver, Josh Tenenbaum, Vikash Mansinghka

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.
Researcher Affiliation	Academia	Tan Zhi-Xuan, Jordyn L. Mann, Tom Silver Joshua B. Tenenbaum, Vikash K. Mansinghka Massachusetts Institute of Technology {xuan,jordynm,tslvr,jbt,vkm}@mit.edu
Pseudocode	Yes	Figure 3 (b) shows 'Boundedly-rational agent programs' with pseudocode for UPDATE-PLAN and SELECT-ACTION. Algorithm 1 is titled 'Sequential Inverse Plan Search (SIPS) for online Bayesian goal inference'.
Open Source Code	Yes	Code for the architecture and experiments presented in this paper is available at https://github. com/ztangent/Plinf.jl/tree/neurips-2020-experiments, as part of the Plinf.jl package for Bayesian inverse planning.
Open Datasets	Yes	We validate our approach on domains with varying degrees of complexity... Taxi (\|G\| = 3, \|S\| = 125): A benchmark domain used in hierarchical reinforcement learning [42]...
Dataset Splits	No	The paper states it runs 'each inference method on a dataset of optimal and non-optimal agent trajectories for each domain' and then evaluates accuracy, but does not specify explicit train/validation splits or percentages for these trajectories.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions 'Gen, a general-purpose probabilistic programming system' and 'Planning Domain Deﬁnition Language (PDDL)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Non-optimal trajectories were generated using the replanning agent model... with parameters r=2, q=0.95, γ=0.1. ...SIPS achieved good performance with 10 particles per goal... For baselines, we used a discount factor of 0.9, and Boltzmann noise parameter α=1.