reproducibilityindex.ai

Foundation Policies with Hilbert Representations

Authors: Seohong Park, Tobias Kreiman, Sergey Levine

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through our experiments on simulated robotic locomotion and manipulation benchmarks, we show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion, even often outperforming prior methods designed specifically for each setting.
Researcher Affiliation	Academia	Seohong Park 1 Tobias Kreiman 1 Sergey Levine 1 1University of California, Berkeley. Correspondence to: Seohong Park <seohong@berkeley.edu>.
Pseudocode	Yes	Algorithm 1 Hilbert Foundation Policies (HILPs)
Open Source Code	Yes	Our code and videos are available at https: //seohong.me/projects/hilp/.
Open Datasets	Yes	For benchmarks, following Touati et al. (2023), we use the Unsupervised RL Benchmark (Laskin et al., 2021) and Ex ORL datasets (Yarats et al., 2022)... For benchmarks, we consider the goal-conditioned variants of Ant Maze and Kitchen tasks (Figure 4) from the D4RL suite (Fu et al., 2020; Park et al., 2023).
Dataset Splits	No	No explicit details on train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits) are provided for the experiments conducted within this paper. The paper refers to training on 'unlabeled trajectory data D' and evaluating on 'test-time tasks' or 'evaluation settings' but does not specify how the data itself was partitioned into training, validation, and testing sets beyond using existing datasets from benchmarks.
Hardware Specification	Yes	This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at UC Berkeley. ... We run our experiments on an internal cluster consisting of A5000 GPUs.
Software Dependencies	No	We implement HILPs based on two different codebases: the official implementation of FB representations (Touati et al., 2023) for zero-shot RL experiments and that of HIQL (Park et al., 2023) for offline goal-conditioned RL and hierarchical RL experiments. ... We use IQL (Kostrikov et al., 2022) with AWR (Peng et al., 2019) as an offline algorithm to train policies. ... For GC-CQL, we modify the Jax CQL repository (Geng, 2022) to make it compatible with our goal-conditioned setting.
Experiment Setup	Yes	We report the full list of the hyperparameters used in our zero-shot RL experiments in Table 5. ... We report the full list of the hyperparameters used in our offline goal-conditioned RL experiments in Table 6. ... We report the full list of the hyperparameters used in our zero-shot hierarchical RL experiments in Table 7.