HIQL: Offline Goal-Conditioned RL with Latent States as Actions
Authors: Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. |
| Researcher Affiliation | Academia | 1University of California, Berkeley 2Princeton University |
| Pseudocode | Yes | Algorithm 1 Hierarchical Implicit Q-Learning (HIQL) |
| Open Source Code | Yes | Our code is available at https://seohong.me/projects/hiql/ |
| Open Datasets | Yes | We use the four medium and large maze datasets from the original D4RL benchmark [28]. CALVIN [63], another long-horizon manipulation environment...The dataset accompanying CALVIN [84]... Roboverse [25, 104] is a pixel-based, goal-conditioned robotic manipulation environment. We use the same dataset and tasks used in Zheng et al. [104]. |
| Dataset Splits | Yes | The dataset consists of 3750 length-300 trajectories, out of which we use the first 3334 trajectories for training (which correspond to approximately 1000000 transitions), while the remaining trajectories are used as a validation set. |
| Hardware Specification | Yes | We run our experiments on an internal GPU cluster composed of TITAN RTX and A5000 GPUs. |
| Software Dependencies | No | The paper states, 'We implement HIQL based on Jax RL Minimal [32],' but it does not provide specific version numbers for this or any other key software components, such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We present the hyperparameters used in our experiments in Table 4, where we mostly follow the network architectures and hyperparameters used by Ghosh et al. [34]. |