reproducibilityindex.ai

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Authors: Ben Eysenbach, Sergey Levine, Russ R. Salakhutdinov

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our approach outperforms prior methods that learn explicit reward functions.
Researcher Affiliation	Collaboration	Benjamin Eysenbach1 2 Sergey Levine2 3 Ruslan Salakhutdinov1 1Carnegie Mellon University, 2Google Brain, 3UC Berkeley
Pseudocode	Yes	Algorithm 1 Recursive Classiﬁcation of Examples
Open Source Code	Yes	Code is available at: https://github.com/rce-anonymous/rce-anonymous.github.io/tree/main/code
Open Datasets	Yes	We evaluate each method on ﬁve Sawyer manipulation tasks from Meta-World [39] and two manipulation tasks from Rajeswaran et al. [26].
Dataset Splits	No	The paper mentions datasets used for experiments but does not provide specific details on training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	No	Each experiment took approximately one day on a standard CPU server. The exact compute resources are proprietary.
Software Dependencies	No	The paper mentions software like SAC, TD3, TF-Agents, and DAC implementations but does not specify their version numbers.
Experiment Setup	Yes	Following prior work [5, 35]), we regularized the policy updates by adding an entropy term with coefﬁcient α = 10 4. We also found that using N-step returns signiﬁcantly improved the results of RCE (see Appendix F for details and ablation experiments.).