Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

Authors: Jonathan Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Complementing our theory results, we also demonstrate that a practical implementation of our approach mitigates covariate shift on benchmark Mu Jo Co continuous control tasks. We demonstrate that with behavior policies whose performances are less than half of that of the expert, MILO still successfully imitates with an extremely low number of expert state-action pairs while traditional offline IL methods such as behavior cloning (BC) fail completely. Source code is provided at https://github.com/jdchang1/milo.
Researcher Affiliation Collaboration Jonathan D. Chang Department of Computer Science Cornell University jdc396@cornell.edu Masatoshi Uehara Department of Computer Science Cornell University mu223@cornell.edu Dhruv Sreenivas Department of Computer Science Cornell University ds844@cornell.edu Rahul Kidambi Amazon Search & AI rk773@cornell.edu Wen Sun Department of Computer Science Cornell University ws455@cornell.edu
Pseudocode Yes Algorithm 1 Framework for model-based Imitation Learning with offline data (MILO) and Algorithm 2 A practical instantiation of MILO
Open Source Code Yes Source code is provided at https://github.com/jdchang1/milo.
Open Datasets Yes We evaluate MILO on five environments from Open AI Gym [11] simulated with Mu Jo Co [67]: Hopper-v2, Walker2d-v2, Half Cheetah-v2, Ant-v2, and Humanoid-v2.
Dataset Splits No The paper mentions collecting 'expert dataset' and 'offline static dataset' but does not specify explicit training, validation, and test splits with percentages or sample counts for these datasets in the conventional sense of supervised learning. The evaluation is done by running policies in the simulation environments.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud computing instances.
Software Dependencies No The paper mentions using Open AI Gym [11] and Mu Jo Co [67] environments and neural networks, but it does not provide specific version numbers for these or any other software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup Yes See appendix for details on hyperparameters, environments, and dataset composition.