Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage
Authors: Jonathan Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Complementing our theory results, we also demonstrate that a practical implementation of our approach mitigates covariate shift on benchmark Mu Jo Co continuous control tasks. We demonstrate that with behavior policies whose performances are less than half of that of the expert, MILO still successfully imitates with an extremely low number of expert state-action pairs while traditional offline IL methods such as behavior cloning (BC) fail completely. Source code is provided at https://github.com/jdchang1/milo. |
| Researcher Affiliation | Collaboration | Jonathan D. Chang Department of Computer Science Cornell University jdc396@cornell.edu Masatoshi Uehara Department of Computer Science Cornell University mu223@cornell.edu Dhruv Sreenivas Department of Computer Science Cornell University ds844@cornell.edu Rahul Kidambi Amazon Search & AI rk773@cornell.edu Wen Sun Department of Computer Science Cornell University ws455@cornell.edu |
| Pseudocode | Yes | Algorithm 1 Framework for model-based Imitation Learning with offline data (MILO) and Algorithm 2 A practical instantiation of MILO |
| Open Source Code | Yes | Source code is provided at https://github.com/jdchang1/milo. |
| Open Datasets | Yes | We evaluate MILO on five environments from Open AI Gym [11] simulated with Mu Jo Co [67]: Hopper-v2, Walker2d-v2, Half Cheetah-v2, Ant-v2, and Humanoid-v2. |
| Dataset Splits | No | The paper mentions collecting 'expert dataset' and 'offline static dataset' but does not specify explicit training, validation, and test splits with percentages or sample counts for these datasets in the conventional sense of supervised learning. The evaluation is done by running policies in the simulation environments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using Open AI Gym [11] and Mu Jo Co [67] environments and neural networks, but it does not provide specific version numbers for these or any other software libraries, frameworks, or dependencies used in the experiments. |
| Experiment Setup | Yes | See appendix for details on hyperparameters, environments, and dataset composition. |