Does Zero-Shot Reinforcement Learning Exist?

Authors: Ahmed Touati, Jérémy Rapin, Yann Ollivier

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We systematically assess SFs and FB for zero-shot RL, including many new models of SF basic features, and improved FB loss functions. We use 13 tasks from the Unsupervised RL benchmark (Laskin et al., 2021), repeated on several Ex ORL training replay buffers (Yarats et al., 2021) to assess robustness to the exploration method. We systematically study the influence of basic features for SFs, by testing SFs on features from ten RL representation learning methods. such as latent next state prediction, inverse curiosity module, contrastive learning, or diversity (APS), perform unconsistently. In contrast, FB representations jointly learn the elementary and successor features from a single, principled criterion. They perform best and consistently across the board, reaching 85% of supervised RL performance with a good replay buffer, in a zero-shot manner.
Researcher Affiliation Industry Ahmed Touati, Jérémy Rapin & Yann Ollivier Meta AI Research, Paris, {atouati,jrapin,yol}@meta.com
Pseudocode Yes Appendix L provides Py Torch snippets for the key losses, notably the FB loss, the SF loss as well as the various feature learning methods for SF.
Open Source Code Yes The code can be found at https://github.com/facebookresearch/controllable_agent
Open Datasets Yes We use 13 tasks from the Unsupervised RL benchmark (Laskin et al., 2021), repeated on several Ex ORL training replay buffers (Yarats et al., 2021) to assess robustness to the exploration method.
Dataset Splits No The paper mentions 'training data' and 'test time' but does not explicitly describe specific train/validation/test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, or memory) used to run its experiments.
Software Dependencies No The paper mentions 'Py Torch snippets' in Appendix L, indicating the use of PyTorch, but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes Table 1 summarizes the hyperparameters used in our experiments.