Foundation Policies with Hilbert Representations
Authors: Seohong Park, Tobias Kreiman, Sergey Levine
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through our experiments on simulated robotic locomotion and manipulation benchmarks, we show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion, even often outperforming prior methods designed specifically for each setting. |
| Researcher Affiliation | Academia | Seohong Park 1 Tobias Kreiman 1 Sergey Levine 1 1University of California, Berkeley. Correspondence to: Seohong Park <seohong@berkeley.edu>. |
| Pseudocode | Yes | Algorithm 1 Hilbert Foundation Policies (HILPs) |
| Open Source Code | Yes | Our code and videos are available at https: //seohong.me/projects/hilp/. |
| Open Datasets | Yes | For benchmarks, following Touati et al. (2023), we use the Unsupervised RL Benchmark (Laskin et al., 2021) and Ex ORL datasets (Yarats et al., 2022)... For benchmarks, we consider the goal-conditioned variants of Ant Maze and Kitchen tasks (Figure 4) from the D4RL suite (Fu et al., 2020; Park et al., 2023). |
| Dataset Splits | No | No explicit details on train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits) are provided for the experiments conducted within this paper. The paper refers to training on 'unlabeled trajectory data D' and evaluating on 'test-time tasks' or 'evaluation settings' but does not specify how the data itself was partitioned into training, validation, and testing sets beyond using existing datasets from benchmarks. |
| Hardware Specification | Yes | This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at UC Berkeley. ... We run our experiments on an internal cluster consisting of A5000 GPUs. |
| Software Dependencies | No | We implement HILPs based on two different codebases: the official implementation of FB representations (Touati et al., 2023) for zero-shot RL experiments and that of HIQL (Park et al., 2023) for offline goal-conditioned RL and hierarchical RL experiments. ... We use IQL (Kostrikov et al., 2022) with AWR (Peng et al., 2019) as an offline algorithm to train policies. ... For GC-CQL, we modify the Jax CQL repository (Geng, 2022) to make it compatible with our goal-conditioned setting. |
| Experiment Setup | Yes | We report the full list of the hyperparameters used in our zero-shot RL experiments in Table 5. ... We report the full list of the hyperparameters used in our offline goal-conditioned RL experiments in Table 6. ... We report the full list of the hyperparameters used in our zero-shot hierarchical RL experiments in Table 7. |