Does Zero-Shot Reinforcement Learning Exist?
Authors: Ahmed Touati, Jérémy Rapin, Yann Ollivier
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically assess SFs and FB for zero-shot RL, including many new models of SF basic features, and improved FB loss functions. We use 13 tasks from the Unsupervised RL benchmark (Laskin et al., 2021), repeated on several Ex ORL training replay buffers (Yarats et al., 2021) to assess robustness to the exploration method. We systematically study the influence of basic features for SFs, by testing SFs on features from ten RL representation learning methods. such as latent next state prediction, inverse curiosity module, contrastive learning, or diversity (APS), perform unconsistently. In contrast, FB representations jointly learn the elementary and successor features from a single, principled criterion. They perform best and consistently across the board, reaching 85% of supervised RL performance with a good replay buffer, in a zero-shot manner. |
| Researcher Affiliation | Industry | Ahmed Touati, Jérémy Rapin & Yann Ollivier Meta AI Research, Paris, {atouati,jrapin,yol}@meta.com |
| Pseudocode | Yes | Appendix L provides Py Torch snippets for the key losses, notably the FB loss, the SF loss as well as the various feature learning methods for SF. |
| Open Source Code | Yes | The code can be found at https://github.com/facebookresearch/controllable_agent |
| Open Datasets | Yes | We use 13 tasks from the Unsupervised RL benchmark (Laskin et al., 2021), repeated on several Ex ORL training replay buffers (Yarats et al., 2021) to assess robustness to the exploration method. |
| Dataset Splits | No | The paper mentions 'training data' and 'test time' but does not explicitly describe specific train/validation/test dataset splits (e.g., percentages, sample counts, or predefined split references). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, or memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch snippets' in Appendix L, indicating the use of PyTorch, but does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | Table 1 summarizes the hyperparameters used in our experiments. |