reproducibilityindex.ai

HIQL: Offline Goal-Conditioned RL with Latent States as Actions

Authors: Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
Researcher Affiliation	Academia	1University of California, Berkeley 2Princeton University
Pseudocode	Yes	Algorithm 1 Hierarchical Implicit Q-Learning (HIQL)
Open Source Code	Yes	Our code is available at https://seohong.me/projects/hiql/
Open Datasets	Yes	We use the four medium and large maze datasets from the original D4RL benchmark [28]. CALVIN [63], another long-horizon manipulation environment...The dataset accompanying CALVIN [84]... Roboverse [25, 104] is a pixel-based, goal-conditioned robotic manipulation environment. We use the same dataset and tasks used in Zheng et al. [104].
Dataset Splits	Yes	The dataset consists of 3750 length-300 trajectories, out of which we use the first 3334 trajectories for training (which correspond to approximately 1000000 transitions), while the remaining trajectories are used as a validation set.
Hardware Specification	Yes	We run our experiments on an internal GPU cluster composed of TITAN RTX and A5000 GPUs.
Software Dependencies	No	The paper states, 'We implement HIQL based on Jax RL Minimal [32],' but it does not provide specific version numbers for this or any other key software components, such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	We present the hyperparameters used in our experiments in Table 4, where we mostly follow the network architectures and hyperparameters used by Ghosh et al. [34].