Online Bayesian Goal Inference for Boundedly Rational Planning Agents

Authors: Tan Zhi-Xuan, Jordyn Mann, Tom Silver, Josh Tenenbaum, Vikash Mansinghka

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.
Researcher Affiliation Academia Tan Zhi-Xuan, Jordyn L. Mann, Tom Silver Joshua B. Tenenbaum, Vikash K. Mansinghka Massachusetts Institute of Technology {xuan,jordynm,tslvr,jbt,vkm}@mit.edu
Pseudocode Yes Figure 3 (b) shows 'Boundedly-rational agent programs' with pseudocode for UPDATE-PLAN and SELECT-ACTION. Algorithm 1 is titled 'Sequential Inverse Plan Search (SIPS) for online Bayesian goal inference'.
Open Source Code Yes Code for the architecture and experiments presented in this paper is available at https://github. com/ztangent/Plinf.jl/tree/neurips-2020-experiments, as part of the Plinf.jl package for Bayesian inverse planning.
Open Datasets Yes We validate our approach on domains with varying degrees of complexity... Taxi (|G| = 3, |S| = 125): A benchmark domain used in hierarchical reinforcement learning [42]...
Dataset Splits No The paper states it runs 'each inference method on a dataset of optimal and non-optimal agent trajectories for each domain' and then evaluates accuracy, but does not specify explicit train/validation splits or percentages for these trajectories.
Hardware Specification No The paper does not explicitly describe any specific hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions 'Gen, a general-purpose probabilistic programming system' and 'Planning Domain Definition Language (PDDL)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Non-optimal trajectories were generated using the replanning agent model... with parameters r=2, q=0.95, γ=0.1. ...SIPS achieved good performance with 10 particles per goal... For baselines, we used a discount factor of 0.9, and Boltzmann noise parameter α=1.