Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

Authors: Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design our experiments to evaluate the generality of our maximum value-conditional state entropy (VCSE) exploration as a technique for improving the sample-efficiency of various RL algorithms (Mnih et al., 2016; Yarats et al., 2021a). We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020).
Researcher Affiliation Collaboration Dongyoung Kim KAIST Jinwoo Shin KAIST Pieter Abbeel UC Berkeley Younggyo Seo KAIST Now at Dyson Robot Learning Lab. Correspondence to EMAIL.
Pseudocode Yes Algorithm 1 Maximum Value-Conditional State Entropy Exploration
Open Source Code Yes Source code is available at https://sites.google.com/view/rl-vcse.
Open Datasets Yes We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020).
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits for reproduction.
Hardware Specification Yes For Mini Grid experiments, we use a single NVIDIA TITAN Xp GPU and 8 CPU cores for each training run. For Deep Mind Control Suite and Meta-World experiments, we use a single NVIDIA 2080Ti GPU and 8 CPU cores for each training run.
Software Dependencies No The paper mentions various software implementations and algorithms used (e.g., RE3, Dr Qv2, MWM, SAC) and provides links to their GitHub repositories. However, it does not specify explicit version numbers for these or other core software dependencies (like Python, PyTorch, etc.).
Experiment Setup Yes We use k = 5 for both SE and VCSE by following the original implementation. ... We use the fixed noise of 0.2. We use k = 12 for both SE and VCSE. ... We normalize value estimates with their mean and standard deviation computed with samples within a mini-batch. ... We use the same hyperparameter of fixed intrinsic scale β = 0.005 and k = 5 for both SE and VCSE following the original implementation. For both SE and VCSE exploration, we find that using β = 0.1 achieves the overall best performance. We also use k = 12 for both SE and VCSE.