Model-based Lifelong Reinforcement Learning with Bayesian Exploration
Authors: Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods. |
| Researcher Affiliation | Academia | Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris Department of Computer Science, Brown University {hfu7,syu68,mlittman,gdk}@cs.brown.edu |
| Pseudocode | Yes | The detailed algorithm is summarized in Algorithm 1. ... We show the detailed backward transfer algorithm in Algorithm 2. |
| Open Source Code | Yes | Code repository available at https://github.com/Minusadd/VBLRL. |
| Open Datasets | Yes | We evaluated the performance of VBLRL on Hi P-MDP versions of several continuous control tasks from the Mujoco physics simulator [45], Half Cheetah-gravity, Half Cheetah-bodyparts, Hopper-gravity, Hopper-bodyparts, Walker-gravity, Walker-bodyparts, all of which are lifelong-RL benchmarks used in prior work [31]. |
| Dataset Splits | No | The paper specifies training iterations and task sequences common in RL, such as '100 iterations for each task and a horizon of 100 (Halfcheetah) or 400 (Hopper & Walker) for each iteration,' but does not describe traditional dataset splits (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | No | The paper mentions that research was conducted using 'computational resources and services at the Center for Computation and Visualization, Brown University' and discusses training times, but does not specify any particular GPU models, CPU models, or other hardware configurations. |
| Software Dependencies | No | The paper mentions general software components like 'Bayesian neural networks' and uses simulators like 'Mujoco' and 'Meta-World', but it does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks, or simulators). |
| Experiment Setup | Yes | We substantially reduced the number of iterations that the agent can sample and train on: 100 iterations for each task and a horizon of 100 (Halfcheetah) or 400 (Hopper & Walker) for each iteration. ... For planning, at each step we begin by creating P particles from the current state... Then, we sample N candidate action sequences... |