Imitating Past Successes can be Very Suboptimal

Authors: Benjamin Eysenbach, Soumith Udatha, Russ R. Salakhutdinov, Sergey Levine

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments study three questions. First, does OCBC fail to converge to the optimal policy in practice, as predicted by our theory? We test this claim on both a bandit problem and a simple 2D navigation problem. Second, on these same tasks, does our proposed fix to OCBC allow the method to converge to the optimal policy, as our theory predicts? Our aim in starting with simple problems is to show that OCBC can fail to converge, even on extremely simple problems.
Researcher Affiliation Academia Benjamin Eysenbachα Soumith Udathaα Sergey Levineβ Ruslan Salakhutdinovα Carnegie Mellon University βUC Berkeley beysenba@cs.cmu.edu
Pseudocode Yes Algorithm 1 Outcome-conditioned behavioral cloning (OCBC)
Open Source Code Yes Code to reproduce the didactic experiments in available.3. 3https://github.com/ben-eysenbach/normalized-ocbc/blob/main/experiments.ipynb
Open Datasets Yes We compare to a recent and prototypical implementation of OCBC, GCSL [9]. To give this baseline a strong footing, we use the goal-reaching benchmark proposed in that paper, reporting normalized returns.
Dataset Splits No The paper does not explicitly provide details on training, validation, and test dataset splits such as percentages or sample counts.
Hardware Specification No The didactic experiments ran in a few minutes on a desktop CPU. The continuous control experiments ran for a few hours on a desktop GPU.
Software Dependencies No No specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) are mentioned in the paper.
Experiment Setup No Appendix D contains experimental details.