reproducibilityindex.ai

Risk-Averse Bayes-Adaptive Reinforcement Learning

Authors: Marc Rigter, Bruno Lacerda, Nick Hawes

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that our approach signiﬁcantly outperforms baseline approaches for this problem.Our empirical results show that our algorithm signiﬁcantly outperforms baseline methods on two domains.
Researcher Affiliation	Academia	Marc Rigter Oxford Robotics Institute University of Oxford mrigter@robots.ox.ac.uk Bruno Lacerda Oxford Robotics Institute University of Oxford bruno@robots.ox.ac.uk Nick Hawes Oxford Robotics Institute University of Oxford nickh@robots.ox.ac.uk
Pseudocode	Yes	Algorithm 1: Risk-Averse Bayes-Adaptive Monte Carlo Planning
Open Source Code	Yes	Code for the experiments is included in the supplementary material.
Open Datasets	Yes	Betting Game Domain We adapt this domain from the literature on conditional value at risk in Markov decision processes [3] to the Bayes-adaptive setting. ... [3] Nicole Bäuerle and Jonathan Ott. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361 379, 2011.
Dataset Splits	No	The paper describes experiments in reinforcement learning environments (Betting Game and Autonomous Car Navigation) where agents interact with a simulated environment. While it mentions 'training' for policy gradient methods, it does not specify traditional 'training/test/validation dataset splits' as would be found with static datasets. The environments are dynamic and generate data through interaction.
Hardware Specification	Yes	Computation times are reported for a 3.2 GHz Intel i7 processor with 64 GB of RAM.
Software Dependencies	No	All algorithms are implemented in C++ and Gurobi is used to solve linear programs for the value iteration (VI) approaches. The paper mentions software tools (C++, Gurobi) but does not provide specific version numbers for them.
Experiment Setup	Yes	For RA-BAMCP and BAMCP the MCTS exploration parameter was set to cmcts = 2 based on empirical performance. For these methods, we performed 100,000 simulations for the initial step and 25,000 simulations per step thereafter. ... For RA-BAMCP, the rollout policy for the adversary was random, the progressive widening parameter was set to τ = 0.2, and the exploration parameter for the GP Bayesian optimisation was also set to cbo = 2.