Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. |
| Researcher Affiliation | Academia | Kevin Frans 1 Seohong Park 1 Pieter Abbeel 1 Sergey Levine 1 1 University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 Functional Reward Encodings (FRE) |
| Open Source Code | Yes | Code for this project is provided at: github.com/kvfrans/fre. |
| Open Datasets | Yes | We utilize the antmaze-large-diverse-v2 dataset from D4RL (Fu et al., 2020). [...] The Ex ORL dataset is a standard collection of offline data for RL, consisting of trajectories sampled by an exploratory policy on Deep Mind Control Suite (Tassa et al., 2018) tasks. |
| Dataset Splits | No | The paper discusses training procedures and data sampling for encoding/decoding, but does not provide specific train/validation/test dataset splits with percentages or counts for reproduction. |
| Hardware Specification | No | This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at UC Berkeley. This mentions a cluster but lacks specific hardware details like CPU/GPU models. |
| Software Dependencies | No | Appendix A lists the optimizer as "Adam" and mentions "IQL Expectile", but does not specify version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Appendix A. Hyperparameters [lists specific values for Batch Size, Training Steps, Learning Rate, Network Layers, etc.] |