Imitating Past Successes can be Very Suboptimal
Authors: Benjamin Eysenbach, Soumith Udatha, Russ R. Salakhutdinov, Sergey Levine
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments study three questions. First, does OCBC fail to converge to the optimal policy in practice, as predicted by our theory? We test this claim on both a bandit problem and a simple 2D navigation problem. Second, on these same tasks, does our proposed fix to OCBC allow the method to converge to the optimal policy, as our theory predicts? Our aim in starting with simple problems is to show that OCBC can fail to converge, even on extremely simple problems. |
| Researcher Affiliation | Academia | Benjamin Eysenbachα Soumith Udathaα Sergey Levineβ Ruslan Salakhutdinovα Carnegie Mellon University βUC Berkeley beysenba@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Outcome-conditioned behavioral cloning (OCBC) |
| Open Source Code | Yes | Code to reproduce the didactic experiments in available.3. 3https://github.com/ben-eysenbach/normalized-ocbc/blob/main/experiments.ipynb |
| Open Datasets | Yes | We compare to a recent and prototypical implementation of OCBC, GCSL [9]. To give this baseline a strong footing, we use the goal-reaching benchmark proposed in that paper, reporting normalized returns. |
| Dataset Splits | No | The paper does not explicitly provide details on training, validation, and test dataset splits such as percentages or sample counts. |
| Hardware Specification | No | The didactic experiments ran in a few minutes on a desktop CPU. The continuous control experiments ran for a few hours on a desktop GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) are mentioned in the paper. |
| Experiment Setup | No | Appendix D contains experimental details. |