reproducibilityindex.ai

Imitating Past Successes can be Very Suboptimal

Authors: Benjamin Eysenbach, Soumith Udatha, Russ R. Salakhutdinov, Sergey Levine

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments study three questions. First, does OCBC fail to converge to the optimal policy in practice, as predicted by our theory? We test this claim on both a bandit problem and a simple 2D navigation problem. Second, on these same tasks, does our proposed fix to OCBC allow the method to converge to the optimal policy, as our theory predicts? Our aim in starting with simple problems is to show that OCBC can fail to converge, even on extremely simple problems.
Researcher Affiliation	Academia	Benjamin Eysenbachα Soumith Udathaα Sergey Levineβ Ruslan Salakhutdinovα Carnegie Mellon University βUC Berkeley beysenba@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 Outcome-conditioned behavioral cloning (OCBC)
Open Source Code	Yes	Code to reproduce the didactic experiments in available.3. 3https://github.com/ben-eysenbach/normalized-ocbc/blob/main/experiments.ipynb
Open Datasets	Yes	We compare to a recent and prototypical implementation of OCBC, GCSL [9]. To give this baseline a strong footing, we use the goal-reaching benchmark proposed in that paper, reporting normalized returns.
Dataset Splits	No	The paper does not explicitly provide details on training, validation, and test dataset splits such as percentages or sample counts.
Hardware Specification	No	The didactic experiments ran in a few minutes on a desktop CPU. The continuous control experiments ran for a few hours on a desktop GPU.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) are mentioned in the paper.
Experiment Setup	No	Appendix D contains experimental details.