Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. |
| Researcher Affiliation | Academia | Kevin Frans 1 Seohong Park 1 Pieter Abbeel 1 Sergey Levine 1 1 University of California, Berkeley kvfrans@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Functional Reward Encodings (FRE) |
| Open Source Code | Yes | Code for this project is provided at: github.com/kvfrans/fre. |
| Open Datasets | Yes | We utilize the antmaze-large-diverse-v2 dataset from D4RL (Fu et al., 2020). [...] The Ex ORL dataset is a standard collection of offline data for RL, consisting of trajectories sampled by an exploratory policy on Deep Mind Control Suite (Tassa et al., 2018) tasks. |
| Dataset Splits | No | The paper discusses training procedures and data sampling for encoding/decoding, but does not provide specific train/validation/test dataset splits with percentages or counts for reproduction. |
| Hardware Specification | No | This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at UC Berkeley. This mentions a cluster but lacks specific hardware details like CPU/GPU models. |
| Software Dependencies | No | Appendix A lists the optimizer as "Adam" and mentions "IQL Expectile", but does not specify version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Appendix A. Hyperparameters [lists specific values for Batch Size, Training Steps, Learning Rate, Network Layers, etc.] |