Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Imitating Past Successes can be Very Suboptimal
Authors: Benjamin Eysenbach, Soumith Udatha, Russ R. Salakhutdinov, Sergey Levine
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments study three questions. First, does OCBC fail to converge to the optimal policy in practice, as predicted by our theory? We test this claim on both a bandit problem and a simple 2D navigation problem. Second, on these same tasks, does our proposed fix to OCBC allow the method to converge to the optimal policy, as our theory predicts? Our aim in starting with simple problems is to show that OCBC can fail to converge, even on extremely simple problems. |
| Researcher Affiliation | Academia | Benjamin Eysenbachα Soumith Udathaα Sergey Levineβ Ruslan Salakhutdinovα Carnegie Mellon University βUC Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 Outcome-conditioned behavioral cloning (OCBC) |
| Open Source Code | Yes | Code to reproduce the didactic experiments in available.3. 3https://github.com/ben-eysenbach/normalized-ocbc/blob/main/experiments.ipynb |
| Open Datasets | Yes | We compare to a recent and prototypical implementation of OCBC, GCSL [9]. To give this baseline a strong footing, we use the goal-reaching benchmark proposed in that paper, reporting normalized returns. |
| Dataset Splits | No | The paper does not explicitly provide details on training, validation, and test dataset splits such as percentages or sample counts. |
| Hardware Specification | No | The didactic experiments ran in a few minutes on a desktop CPU. The continuous control experiments ran for a few hours on a desktop GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) are mentioned in the paper. |
| Experiment Setup | No | Appendix D contains experimental details. |