Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Authors: Ben Eysenbach, Sergey Levine, Russ R. Salakhutdinov

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our approach outperforms prior methods that learn explicit reward functions.
Researcher Affiliation Collaboration Benjamin Eysenbach1 2 Sergey Levine2 3 Ruslan Salakhutdinov1 1Carnegie Mellon University, 2Google Brain, 3UC Berkeley
Pseudocode Yes Algorithm 1 Recursive Classification of Examples
Open Source Code Yes Code is available at: https://github.com/rce-anonymous/rce-anonymous.github.io/tree/main/code
Open Datasets Yes We evaluate each method on five Sawyer manipulation tasks from Meta-World [39] and two manipulation tasks from Rajeswaran et al. [26].
Dataset Splits No The paper mentions datasets used for experiments but does not provide specific details on training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification No Each experiment took approximately one day on a standard CPU server. The exact compute resources are proprietary.
Software Dependencies No The paper mentions software like SAC, TD3, TF-Agents, and DAC implementations but does not specify their version numbers.
Experiment Setup Yes Following prior work [5, 35]), we regularized the policy updates by adding an entropy term with coefficient α = 10 4. We also found that using N-step returns significantly improved the results of RCE (see Appendix F for details and ablation experiments.).