Sequence Model Imitation Learning with Unobserved Contexts
Authors: Gokul Swamy, Sanjiban Choudhury, J. Bagnell, Steven Z. Wu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments in a toy bandit domain that show that there exist sharp phase transitions of whether off-policy approaches are able to match expert performance asymptotically, in contrast to the uniformly good performance of on-policy approaches. We demonstrate that on several continuous control tasks, on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history. |
| Researcher Affiliation | Collaboration | Gokul Swamy Carnegie Mellon University gswamy@cmu.edu Sanjiban Choudhury Cornell University sanjibanc@cornell.edu J. Andrew Bagnell Aurora Innovation and Carnegie Mellon University dbagnell@ri.cmu.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu |
| Pseudocode | No | The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | We release our code at https://github.com/gkswamy98/sequence_model_il. |
| Open Datasets | No | The paper mentions using "Py Bullet tasks" but does not provide concrete access information (link, DOI, or specific citation with authors/year for the exact dataset instances used) for a publicly available or open dataset beyond the general framework. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | No | The paper mentions a "GPU award from NVIDIA" in the acknowledgments but does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper discusses tasks and algorithms (e.g., "Py Bullet tasks", "BC", "DAgger") but does not list specific software components with version numbers. |
| Experiment Setup | No | The paper mentions using "history (of length 5 for all experiments)" but does not provide specific hyperparameter values or other detailed training configurations for the experimental setup. |