reproducibilityindex.ai

Bayesian Robust Optimization for Imitation Learning

Authors: Daniel Brown, Scott Niekum, Marek Petrik

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms. In the next two sections we explore two case studies that highlight the performance and beneﬁts of using BROIL for robust policy optimization. We sampled 2000 reward functions from the prior distributions over costs and computed the CVa R optimal policy with α = 0.99 for different values of λ. Figure 5 shows that both formulations of BROIL signiﬁcantly outperform Max Ent IRL and LPAL.
Researcher Affiliation	Academia	Daniel S. Brown UC Berkeley dsbrown@berkeley.edu Scott Niekum University of Texas at Austin sniekum@cs.utexas.edu Marek Petrik University of New Hampshire mpetrik@cs.unh.edu
Pseudocode	No	The paper includes mathematical formulations but does not contain structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code to reproduce experiments is available at https://github.com/dsbrown1331/broil
Open Datasets	No	The paper describes generating samples or using a single demonstration, but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper mentions running experiments on "a personal laptop" but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup	Yes	We sampled 2000 reward functions from the prior distributions over costs and computed the CVa R optimal policy with α = 0.99 for different values of λ. Given the single demonstration, we generated 2000 samples from the posterior P(R \| D) using Bayesian IRL [46]. We used a relatively small inverse temperature parameter (β = 10).