Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
HIQL: Offline Goal-Conditioned RL with Latent States as Actions
Authors: Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. |
| Researcher Affiliation | Academia | 1University of California, Berkeley 2Princeton University |
| Pseudocode | Yes | Algorithm 1 Hierarchical Implicit Q-Learning (HIQL) |
| Open Source Code | Yes | Our code is available at https://seohong.me/projects/hiql/ |
| Open Datasets | Yes | We use the four medium and large maze datasets from the original D4RL benchmark [28]. CALVIN [63], another long-horizon manipulation environment...The dataset accompanying CALVIN [84]... Roboverse [25, 104] is a pixel-based, goal-conditioned robotic manipulation environment. We use the same dataset and tasks used in Zheng et al. [104]. |
| Dataset Splits | Yes | The dataset consists of 3750 length-300 trajectories, out of which we use the first 3334 trajectories for training (which correspond to approximately 1000000 transitions), while the remaining trajectories are used as a validation set. |
| Hardware Specification | Yes | We run our experiments on an internal GPU cluster composed of TITAN RTX and A5000 GPUs. |
| Software Dependencies | No | The paper states, 'We implement HIQL based on Jax RL Minimal [32],' but it does not provide specific version numbers for this or any other key software components, such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We present the hyperparameters used in our experiments in Table 4, where we mostly follow the network architectures and hyperparameters used by Ghosh et al. [34]. |