Online Bayesian Goal Inference for Boundedly Rational Planning Agents
Authors: Tan Zhi-Xuan, Jordyn Mann, Tom Silver, Josh Tenenbaum, Vikash Mansinghka
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards. |
| Researcher Affiliation | Academia | Tan Zhi-Xuan, Jordyn L. Mann, Tom Silver Joshua B. Tenenbaum, Vikash K. Mansinghka Massachusetts Institute of Technology {xuan,jordynm,tslvr,jbt,vkm}@mit.edu |
| Pseudocode | Yes | Figure 3 (b) shows 'Boundedly-rational agent programs' with pseudocode for UPDATE-PLAN and SELECT-ACTION. Algorithm 1 is titled 'Sequential Inverse Plan Search (SIPS) for online Bayesian goal inference'. |
| Open Source Code | Yes | Code for the architecture and experiments presented in this paper is available at https://github. com/ztangent/Plinf.jl/tree/neurips-2020-experiments, as part of the Plinf.jl package for Bayesian inverse planning. |
| Open Datasets | Yes | We validate our approach on domains with varying degrees of complexity... Taxi (|G| = 3, |S| = 125): A benchmark domain used in hierarchical reinforcement learning [42]... |
| Dataset Splits | No | The paper states it runs 'each inference method on a dataset of optimal and non-optimal agent trajectories for each domain' and then evaluates accuracy, but does not specify explicit train/validation splits or percentages for these trajectories. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Gen, a general-purpose probabilistic programming system' and 'Planning Domain Definition Language (PDDL)' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Non-optimal trajectories were generated using the replanning agent model... with parameters r=2, q=0.95, γ=0.1. ...SIPS achieved good performance with 10 particles per goal... For baselines, we used a discount factor of 0.9, and Boltzmann noise parameter α=1. |