Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

Authors: Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design our experiments to evaluate the generality of our maximum value-conditional state entropy (VCSE) exploration as a technique for improving the sample-efficiency of various RL algorithms (Mnih et al., 2016; Yarats et al., 2021a). We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020).
Researcher Affiliation Collaboration Dongyoung Kim KAIST Jinwoo Shin KAIST Pieter Abbeel UC Berkeley Younggyo Seo KAIST Now at Dyson Robot Learning Lab. Correspondence to younggyo.seo@dyson.com.
Pseudocode Yes Algorithm 1 Maximum Value-Conditional State Entropy Exploration
Open Source Code Yes Source code is available at https://sites.google.com/view/rl-vcse.
Open Datasets Yes We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020).
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits for reproduction.
Hardware Specification Yes For Mini Grid experiments, we use a single NVIDIA TITAN Xp GPU and 8 CPU cores for each training run. For Deep Mind Control Suite and Meta-World experiments, we use a single NVIDIA 2080Ti GPU and 8 CPU cores for each training run.
Software Dependencies No The paper mentions various software implementations and algorithms used (e.g., RE3, Dr Qv2, MWM, SAC) and provides links to their GitHub repositories. However, it does not specify explicit version numbers for these or other core software dependencies (like Python, PyTorch, etc.).
Experiment Setup Yes We use k = 5 for both SE and VCSE by following the original implementation. ... We use the fixed noise of 0.2. We use k = 12 for both SE and VCSE. ... We normalize value estimates with their mean and standard deviation computed with samples within a mini-batch. ... We use the same hyperparameter of fixed intrinsic scale β = 0.005 and k = 5 for both SE and VCSE following the original implementation. For both SE and VCSE exploration, we find that using β = 0.1 achieves the overall best performance. We also use k = 12 for both SE and VCSE.