Risk-Averse Bayes-Adaptive Reinforcement Learning
Authors: Marc Rigter, Bruno Lacerda, Nick Hawes
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.Our empirical results show that our algorithm significantly outperforms baseline methods on two domains. |
| Researcher Affiliation | Academia | Marc Rigter Oxford Robotics Institute University of Oxford mrigter@robots.ox.ac.uk Bruno Lacerda Oxford Robotics Institute University of Oxford bruno@robots.ox.ac.uk Nick Hawes Oxford Robotics Institute University of Oxford nickh@robots.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1: Risk-Averse Bayes-Adaptive Monte Carlo Planning |
| Open Source Code | Yes | Code for the experiments is included in the supplementary material. |
| Open Datasets | Yes | Betting Game Domain We adapt this domain from the literature on conditional value at risk in Markov decision processes [3] to the Bayes-adaptive setting. ... [3] Nicole Bäuerle and Jonathan Ott. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361 379, 2011. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments (Betting Game and Autonomous Car Navigation) where agents interact with a simulated environment. While it mentions 'training' for policy gradient methods, it does not specify traditional 'training/test/validation dataset splits' as would be found with static datasets. The environments are dynamic and generate data through interaction. |
| Hardware Specification | Yes | Computation times are reported for a 3.2 GHz Intel i7 processor with 64 GB of RAM. |
| Software Dependencies | No | All algorithms are implemented in C++ and Gurobi is used to solve linear programs for the value iteration (VI) approaches. The paper mentions software tools (C++, Gurobi) but does not provide specific version numbers for them. |
| Experiment Setup | Yes | For RA-BAMCP and BAMCP the MCTS exploration parameter was set to cmcts = 2 based on empirical performance. For these methods, we performed 100,000 simulations for the initial step and 25,000 simulations per step thereafter. ... For RA-BAMCP, the rollout policy for the adversary was random, the progressive widening parameter was set to τ = 0.2, and the exploration parameter for the GP Bayesian optimisation was also set to cbo = 2. |