Minimax Optimal Online Imitation Learning via Replay Estimation
Authors: Gokul Swamy, Nived Rajaraman, Matt Peng, Sanjiban Choudhury, J. Bagnell, Steven Z. Wu, Jiantao Jiao, Kannan Ramchandran
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes. |
| Researcher Affiliation | Collaboration | Gokul Swamy Carnegie Mellon University gswamy@cmu.edu Nived Rajaraman UC Berkeley nived.rajaraman@berkeley.edu Matthew Peng UC Berkeley Sanjiban Choudhury Cornell University J. Andrew Bagnell Aurora Innovation and Carnegie Mellon University Zhiwei Steven Wu Carnegie Mellon University Jiantao Jiao UC Berkeley Kannan Ramchandran UC Berkeley |
| Pseudocode | Yes | Algorithm 1: Replay Estimation (RE ) |
| Open Source Code | Yes | We release our code at https://github.com/gkswamy98/replay_est. |
| Open Datasets | Yes | We now quantify the empirical benefits of RE on several continuous control tasks from the the Py Bullet suite Coumans and Bai [2016 2019]. |
| Dataset Splits | No | The paper mentions using a 'fixed set of expert demonstrations' and 'relatively few demonstrations (Nexp 20)', but does not provide specific train/validation/test splits by percentage or sample count, nor does it refer to predefined splits from cited sources for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Py Bullet suite' and 'Stable-baselines3' and references various algorithms (e.g., Soft-Actor Critic, PPO), but it does not provide specific version numbers for these software components or libraries, which is necessary for reproducibility. |
| Experiment Setup | No | The paper mentions the number of expert trajectories (Nexp) and replay rollouts (Nreplay), but it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations. |