reproducibilityindex.ai

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Authors: Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that exploration bonus from uncertainty in learned reward improves both feedbackand sample-efﬁciency of preference-based RL algorithms on complex robot manipulation tasks from Meta World benchmarks, compared with other existing exploration methods that measure the novelty of state visitation.
Researcher Affiliation	Academia	Xinran Liang1, Katherine Shu1, Kimin Lee1 , Pieter Abbeel1 1University of California, Berkeley
Pseudocode	Yes	The full procedure of RUNE is summarized in Algorithm 1. [...] Algorithm 1 RUNE: Reward Uncertainty for Exploration
Open Source Code	No	The paper mentions using publicly released implementations of the PEBBLE algorithm (https://github.com/pokaxpoka/B_Pref), SAC algorithm (https://github.com/denisyarats/pytorch_sac), RE3 (https://github.com/younggyoseo/RE3), and URLB (https://anonymous.4open.science/r/urlb). However, it does not explicitly state that the source code for RUNE itself, developed by the authors of this paper, is open-source or available.
Open Datasets	Yes	In order to verify the efﬁcacy of exploration in preference-based RL, we focus on having an agent solve a range of complex robotic manipulation skills from Meta-World (Yu et al., 2020)
Dataset Splits	No	The paper uses the Meta-World benchmark but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) needed for reproduction. It mentions using an "oracle scripted teacher" for feedback.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using an "Optimizer Adam (Kingma & Ba, 2015)" and refers to public repositories for PEBBLE, SAC, RE3, and URLB algorithms. However, it does not specify version numbers for any of these software dependencies (e.g., PyTorch version, Python version, specific library versions), which are crucial for reproducible setup.
Experiment Setup	Yes	Table 2: Hyperparameters of the PEBBLE algorithm. Table 3: Hyperparameters of the SAC algorithm. For all methods we consider, we carefully tune a range of hyperparameters and report the best results. In particular, we consider β0 = 0.05 and ρ {0.001, 0.0001, 0.00001} for all exploration methods, and k {5, 10} for state entropy based exploration.