Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reinforcement Learning with Prototypical Representations
Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we discuss empirical results on using Proto RL for learning visual representations. (Section 5) and Proto-RL significantly improves upon Random exploration and APT across all environments, while being better than Curiosity based exploration in 7/8 environments. (Section 5.2) |
| Researcher Affiliation | Collaboration | 1New York University 2Facebook AI Research. |
| Pseudocode | Yes | The pseudo-code for our framework is provided in Appendix C. (Section 4.1) and Algorithm 1: Proto-RL (Appendix C). |
| Open Source Code | Yes | We open-source our code at https: //github.com/denisyarats/proto. |
| Open Datasets | Yes | We use the Deep Mind Control Suite (Tassa et al., 2018), a challenging benchmark for image-based RL. |
| Dataset Splits | No | The paper describes environment interactions and evaluation episodes, but it does not specify explicit training/validation/test dataset splits like percentages or sample counts for a static dataset. In RL, data is dynamically generated. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA V100), CPU models, or cloud computing instance types used for the experiments. |
| Software Dependencies | Yes | Proto-RL is trained using Adam (Kingma & Ba, 2014)... We use SAC implementation from Yarats & Kostrikov (2020). |
| Experiment Setup | Yes | Hyper-parameters Proto-RL is trained using Adam (Kingma & Ba, 2014) with learning rate 10 4 and mini-batch size of 512. The downstream exploration hyper-parameter is = 0.2 and the number of cluster candidates is set to T = 4. We use SAC implementation from Yarats & Kostrikov (2020). |