reproducibilityindex.ai

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

Authors: Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design our experiments to evaluate the generality of our maximum value-conditional state entropy (VCSE) exploration as a technique for improving the sample-efficiency of various RL algorithms (Mnih et al., 2016; Yarats et al., 2021a). We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020).
Researcher Affiliation	Collaboration	Dongyoung Kim KAIST Jinwoo Shin KAIST Pieter Abbeel UC Berkeley Younggyo Seo KAIST Now at Dyson Robot Learning Lab. Correspondence to younggyo.seo@dyson.com.
Pseudocode	Yes	Algorithm 1 Maximum Value-Conditional State Entropy Exploration
Open Source Code	Yes	Source code is available at https://sites.google.com/view/rl-vcse.
Open Datasets	Yes	We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020).
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits for reproduction.
Hardware Specification	Yes	For Mini Grid experiments, we use a single NVIDIA TITAN Xp GPU and 8 CPU cores for each training run. For Deep Mind Control Suite and Meta-World experiments, we use a single NVIDIA 2080Ti GPU and 8 CPU cores for each training run.
Software Dependencies	No	The paper mentions various software implementations and algorithms used (e.g., RE3, Dr Qv2, MWM, SAC) and provides links to their GitHub repositories. However, it does not specify explicit version numbers for these or other core software dependencies (like Python, PyTorch, etc.).
Experiment Setup	Yes	We use k = 5 for both SE and VCSE by following the original implementation. ... We use the fixed noise of 0.2. We use k = 12 for both SE and VCSE. ... We normalize value estimates with their mean and standard deviation computed with samples within a mini-batch. ... We use the same hyperparameter of fixed intrinsic scale β = 0.005 and k = 5 for both SE and VCSE following the original implementation. For both SE and VCSE exploration, we find that using β = 0.1 achieves the overall best performance. We also use k = 12 for both SE and VCSE.