Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration
Authors: Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design our experiments to evaluate the generality of our maximum value-conditional state entropy (VCSE) exploration as a technique for improving the sample-efficiency of various RL algorithms (Mnih et al., 2016; Yarats et al., 2021a). We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020). |
| Researcher Affiliation | Collaboration | Dongyoung Kim KAIST Jinwoo Shin KAIST Pieter Abbeel UC Berkeley Younggyo Seo KAIST Now at Dyson Robot Learning Lab. Correspondence to younggyo.seo@dyson.com. |
| Pseudocode | Yes | Algorithm 1 Maximum Value-Conditional State Entropy Exploration |
| Open Source Code | Yes | Source code is available at https://sites.google.com/view/rl-vcse. |
| Open Datasets | Yes | We conduct extensive experiments on a range of challenging and highdimensional domains, including partially-observable navigation tasks from Mini Grid (Chevalier-Boisvert et al., 2018), pixel-based locomotion tasks from Deep Mind Control Suite (DMC; Tassa et al. (2020)), and pixel-based manipulation tasks from Meta-World (Yu et al., 2020). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits for reproduction. |
| Hardware Specification | Yes | For Mini Grid experiments, we use a single NVIDIA TITAN Xp GPU and 8 CPU cores for each training run. For Deep Mind Control Suite and Meta-World experiments, we use a single NVIDIA 2080Ti GPU and 8 CPU cores for each training run. |
| Software Dependencies | No | The paper mentions various software implementations and algorithms used (e.g., RE3, Dr Qv2, MWM, SAC) and provides links to their GitHub repositories. However, it does not specify explicit version numbers for these or other core software dependencies (like Python, PyTorch, etc.). |
| Experiment Setup | Yes | We use k = 5 for both SE and VCSE by following the original implementation. ... We use the fixed noise of 0.2. We use k = 12 for both SE and VCSE. ... We normalize value estimates with their mean and standard deviation computed with samples within a mini-batch. ... We use the same hyperparameter of fixed intrinsic scale β = 0.005 and k = 5 for both SE and VCSE following the original implementation. For both SE and VCSE exploration, we find that using β = 0.1 achieves the overall best performance. We also use k = 12 for both SE and VCSE. |