reproducibilityindex.ai

Sequence Model Imitation Learning with Unobserved Contexts

Authors: Gokul Swamy, Sanjiban Choudhury, J. Bagnell, Steven Z. Wu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments in a toy bandit domain that show that there exist sharp phase transitions of whether off-policy approaches are able to match expert performance asymptotically, in contrast to the uniformly good performance of on-policy approaches. We demonstrate that on several continuous control tasks, on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history.
Researcher Affiliation	Collaboration	Gokul Swamy Carnegie Mellon University gswamy@cmu.edu Sanjiban Choudhury Cornell University sanjibanc@cornell.edu J. Andrew Bagnell Aurora Innovation and Carnegie Mellon University dbagnell@ri.cmu.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu
Pseudocode	No	The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	We release our code at https://github.com/gkswamy98/sequence_model_il.
Open Datasets	No	The paper mentions using "Py Bullet tasks" but does not provide concrete access information (link, DOI, or specific citation with authors/year for the exact dataset instances used) for a publicly available or open dataset beyond the general framework.
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification	No	The paper mentions a "GPU award from NVIDIA" in the acknowledgments but does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper discusses tasks and algorithms (e.g., "Py Bullet tasks", "BC", "DAgger") but does not list specific software components with version numbers.
Experiment Setup	No	The paper mentions using "history (of length 5 for all experiments)" but does not provide specific hyperparameter values or other detailed training configurations for the experimental setup.