Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Imitating Past Successes can be Very Suboptimal

Authors: Benjamin Eysenbach, Soumith Udatha, Russ R. Salakhutdinov, Sergey Levine

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments study three questions. First, does OCBC fail to converge to the optimal policy in practice, as predicted by our theory? We test this claim on both a bandit problem and a simple 2D navigation problem. Second, on these same tasks, does our proposed fix to OCBC allow the method to converge to the optimal policy, as our theory predicts? Our aim in starting with simple problems is to show that OCBC can fail to converge, even on extremely simple problems.
Researcher Affiliation Academia Benjamin Eysenbachα Soumith Udathaα Sergey Levineβ Ruslan Salakhutdinovα Carnegie Mellon University βUC Berkeley EMAIL
Pseudocode Yes Algorithm 1 Outcome-conditioned behavioral cloning (OCBC)
Open Source Code Yes Code to reproduce the didactic experiments in available.3. 3https://github.com/ben-eysenbach/normalized-ocbc/blob/main/experiments.ipynb
Open Datasets Yes We compare to a recent and prototypical implementation of OCBC, GCSL [9]. To give this baseline a strong footing, we use the goal-reaching benchmark proposed in that paper, reporting normalized returns.
Dataset Splits No The paper does not explicitly provide details on training, validation, and test dataset splits such as percentages or sample counts.
Hardware Specification No The didactic experiments ran in a few minutes on a desktop CPU. The continuous control experiments ran for a few hours on a desktop GPU.
Software Dependencies No No specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) are mentioned in the paper.
Experiment Setup No Appendix D contains experimental details.