reproducibilityindex.ai

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Authors: Zaynah Javed, Daniel S Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate PG-BROIL, we consider settings where there is uncertainty over the true reward function. We ﬁrst examine the setting where we have an a priori distribution over reward functions and ﬁnd that PG-BROIL is able to optimize policies that effectively trade-off between expected and worst-case performance. Then, we leverage recent advances in efﬁcient Bayesian reward inference (Brown et al., 2020a) to infer a posterior over reward functions from preferences over demonstrated trajectories. ... Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator s reward function. 5. Experiments
Researcher Affiliation	Academia	1EECS Department, University of California, Berkeley 2CS Department, University of New Hampshire. Correspondence to: Daniel Brown <dsbrown@berkeley.edu>.
Pseudocode	Yes	Algorithm 1 Policy Gradient BROIL
Open Source Code	Yes	Code and videos are available at https://sites.google. com/view/pg-broil.
Open Datasets	Yes	We study 3 domains: the classical Cart Pole benchmark (Brockman et al., 2016), a pointmass navigation task inspired by (Thananjeyan et al., 2020b) and a robotic reaching task from the from the DM Control Suite (Tassa et al., 2020).
Dataset Splits	No	No explicit mention of specific train/validation/test dataset splits (percentages, counts, or predefined citations) was found. The experimental setup describes policy training via rollouts and subsequent testing.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are provided.
Software Dependencies	No	The paper mentions Open AI Spinning Up, REINFORCE, and PPO, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For PG-BROIL, we set α = 0.95 and report results for the best λ (λ = 0.8). For PG-BROIL, we set α = 0.9 and report results for λ = 0.15. For PG-BROIL, we set α = 0.9 and report results for the best λ (λ = 0.3).