Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
Authors: Ben Eysenbach, Sergey Levine, Russ R. Salakhutdinov
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach outperforms prior methods that learn explicit reward functions. |
| Researcher Affiliation | Collaboration | Benjamin Eysenbach1 2 Sergey Levine2 3 Ruslan Salakhutdinov1 1Carnegie Mellon University, 2Google Brain, 3UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Recursive Classification of Examples |
| Open Source Code | Yes | Code is available at: https://github.com/rce-anonymous/rce-anonymous.github.io/tree/main/code |
| Open Datasets | Yes | We evaluate each method on five Sawyer manipulation tasks from Meta-World [39] and two manipulation tasks from Rajeswaran et al. [26]. |
| Dataset Splits | No | The paper mentions datasets used for experiments but does not provide specific details on training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | Each experiment took approximately one day on a standard CPU server. The exact compute resources are proprietary. |
| Software Dependencies | No | The paper mentions software like SAC, TD3, TF-Agents, and DAC implementations but does not specify their version numbers. |
| Experiment Setup | Yes | Following prior work [5, 35]), we regularized the policy updates by adding an entropy term with coefficient α = 10 4. We also found that using N-step returns significantly improved the results of RCE (see Appendix F for details and ablation experiments.). |