Sequence Model Imitation Learning with Unobserved Contexts

Authors: Gokul Swamy, Sanjiban Choudhury, J. Bagnell, Steven Z. Wu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments in a toy bandit domain that show that there exist sharp phase transitions of whether off-policy approaches are able to match expert performance asymptotically, in contrast to the uniformly good performance of on-policy approaches. We demonstrate that on several continuous control tasks, on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history.
Researcher Affiliation Collaboration Gokul Swamy Carnegie Mellon University gswamy@cmu.edu Sanjiban Choudhury Cornell University sanjibanc@cornell.edu J. Andrew Bagnell Aurora Innovation and Carnegie Mellon University dbagnell@ri.cmu.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu
Pseudocode No The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes We release our code at https://github.com/gkswamy98/sequence_model_il.
Open Datasets No The paper mentions using "Py Bullet tasks" but does not provide concrete access information (link, DOI, or specific citation with authors/year for the exact dataset instances used) for a publicly available or open dataset beyond the general framework.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification No The paper mentions a "GPU award from NVIDIA" in the acknowledgments but does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper discusses tasks and algorithms (e.g., "Py Bullet tasks", "BC", "DAgger") but does not list specific software components with version numbers.
Experiment Setup No The paper mentions using "history (of length 5 for all experiments)" but does not provide specific hyperparameter values or other detailed training configurations for the experimental setup.