HIQL: Offline Goal-Conditioned RL with Latent States as Actions

Authors: Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
Researcher Affiliation Academia 1University of California, Berkeley 2Princeton University
Pseudocode Yes Algorithm 1 Hierarchical Implicit Q-Learning (HIQL)
Open Source Code Yes Our code is available at https://seohong.me/projects/hiql/
Open Datasets Yes We use the four medium and large maze datasets from the original D4RL benchmark [28]. CALVIN [63], another long-horizon manipulation environment...The dataset accompanying CALVIN [84]... Roboverse [25, 104] is a pixel-based, goal-conditioned robotic manipulation environment. We use the same dataset and tasks used in Zheng et al. [104].
Dataset Splits Yes The dataset consists of 3750 length-300 trajectories, out of which we use the first 3334 trajectories for training (which correspond to approximately 1000000 transitions), while the remaining trajectories are used as a validation set.
Hardware Specification Yes We run our experiments on an internal GPU cluster composed of TITAN RTX and A5000 GPUs.
Software Dependencies No The paper states, 'We implement HIQL based on Jax RL Minimal [32],' but it does not provide specific version numbers for this or any other key software components, such as Python, PyTorch, or CUDA.
Experiment Setup Yes We present the hyperparameters used in our experiments in Table 4, where we mostly follow the network architectures and hyperparameters used by Ghosh et al. [34].