reproducibilityindex.ai

Model-based Lifelong Reinforcement Learning with Bayesian Exploration

Authors: Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods.
Researcher Affiliation	Academia	Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris Department of Computer Science, Brown University {hfu7,syu68,mlittman,gdk}@cs.brown.edu
Pseudocode	Yes	The detailed algorithm is summarized in Algorithm 1. ... We show the detailed backward transfer algorithm in Algorithm 2.
Open Source Code	Yes	Code repository available at https://github.com/Minusadd/VBLRL.
Open Datasets	Yes	We evaluated the performance of VBLRL on Hi P-MDP versions of several continuous control tasks from the Mujoco physics simulator [45], Half Cheetah-gravity, Half Cheetah-bodyparts, Hopper-gravity, Hopper-bodyparts, Walker-gravity, Walker-bodyparts, all of which are lifelong-RL benchmarks used in prior work [31].
Dataset Splits	No	The paper specifies training iterations and task sequences common in RL, such as '100 iterations for each task and a horizon of 100 (Halfcheetah) or 400 (Hopper & Walker) for each iteration,' but does not describe traditional dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	No	The paper mentions that research was conducted using 'computational resources and services at the Center for Computation and Visualization, Brown University' and discusses training times, but does not specify any particular GPU models, CPU models, or other hardware configurations.
Software Dependencies	No	The paper mentions general software components like 'Bayesian neural networks' and uses simulators like 'Mujoco' and 'Meta-World', but it does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks, or simulators).
Experiment Setup	Yes	We substantially reduced the number of iterations that the agent can sample and train on: 100 iterations for each task and a horizon of 100 (Halfcheetah) or 400 (Hopper & Walker) for each iteration. ... For planning, at each step we begin by creating P particles from the current state... Then, we sample N candidate action sequences...