Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
Authors: Ben Eysenbach, Sergey Levine, Russ R. Salakhutdinov
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach outperforms prior methods that learn explicit reward functions. |
| Researcher Affiliation | Collaboration | Benjamin Eysenbach1 2 Sergey Levine2 3 Ruslan Salakhutdinov1 1Carnegie Mellon University, 2Google Brain, 3UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Recursive Classification of Examples |
| Open Source Code | Yes | Code is available at: https://github.com/rce-anonymous/rce-anonymous.github.io/tree/main/code |
| Open Datasets | Yes | We evaluate each method on five Sawyer manipulation tasks from Meta-World [39] and two manipulation tasks from Rajeswaran et al. [26]. |
| Dataset Splits | No | The paper mentions datasets used for experiments but does not provide specific details on training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | Each experiment took approximately one day on a standard CPU server. The exact compute resources are proprietary. |
| Software Dependencies | No | The paper mentions software like SAC, TD3, TF-Agents, and DAC implementations but does not specify their version numbers. |
| Experiment Setup | Yes | Following prior work [5, 35]), we regularized the policy updates by adding an entropy term with coefficient α = 10 4. We also found that using N-step returns significantly improved the results of RCE (see Appendix F for details and ablation experiments.). |