reproducibilityindex.ai

Does Zero-Shot Reinforcement Learning Exist?

Authors: Ahmed Touati, Jérémy Rapin, Yann Ollivier

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically assess SFs and FB for zero-shot RL, including many new models of SF basic features, and improved FB loss functions. We use 13 tasks from the Unsupervised RL benchmark (Laskin et al., 2021), repeated on several Ex ORL training replay buffers (Yarats et al., 2021) to assess robustness to the exploration method. We systematically study the inﬂuence of basic features for SFs, by testing SFs on features from ten RL representation learning methods. such as latent next state prediction, inverse curiosity module, contrastive learning, or diversity (APS), perform unconsistently. In contrast, FB representations jointly learn the elementary and successor features from a single, principled criterion. They perform best and consistently across the board, reaching 85% of supervised RL performance with a good replay buffer, in a zero-shot manner.
Researcher Affiliation	Industry	Ahmed Touati, Jérémy Rapin & Yann Ollivier Meta AI Research, Paris, {atouati,jrapin,yol}@meta.com
Pseudocode	Yes	Appendix L provides Py Torch snippets for the key losses, notably the FB loss, the SF loss as well as the various feature learning methods for SF.
Open Source Code	Yes	The code can be found at https://github.com/facebookresearch/controllable_agent
Open Datasets	Yes	We use 13 tasks from the Unsupervised RL benchmark (Laskin et al., 2021), repeated on several Ex ORL training replay buffers (Yarats et al., 2021) to assess robustness to the exploration method.
Dataset Splits	No	The paper mentions 'training data' and 'test time' but does not explicitly describe specific train/validation/test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, or memory) used to run its experiments.
Software Dependencies	No	The paper mentions 'Py Torch snippets' in Appendix L, indicating the use of PyTorch, but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	Table 1 summarizes the hyperparameters used in our experiments.