Minimax Optimal Online Imitation Learning via Replay Estimation

Authors: Gokul Swamy, Nived Rajaraman, Matt Peng, Sanjiban Choudhury, J. Bagnell, Steven Z. Wu, Jiantao Jiao, Kannan Ramchandran

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes.
Researcher Affiliation Collaboration Gokul Swamy Carnegie Mellon University gswamy@cmu.edu Nived Rajaraman UC Berkeley nived.rajaraman@berkeley.edu Matthew Peng UC Berkeley Sanjiban Choudhury Cornell University J. Andrew Bagnell Aurora Innovation and Carnegie Mellon University Zhiwei Steven Wu Carnegie Mellon University Jiantao Jiao UC Berkeley Kannan Ramchandran UC Berkeley
Pseudocode Yes Algorithm 1: Replay Estimation (RE )
Open Source Code Yes We release our code at https://github.com/gkswamy98/replay_est.
Open Datasets Yes We now quantify the empirical benefits of RE on several continuous control tasks from the the Py Bullet suite Coumans and Bai [2016 2019].
Dataset Splits No The paper mentions using a 'fixed set of expert demonstrations' and 'relatively few demonstrations (Nexp 20)', but does not provide specific train/validation/test splits by percentage or sample count, nor does it refer to predefined splits from cited sources for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions software like 'Py Bullet suite' and 'Stable-baselines3' and references various algorithms (e.g., Soft-Actor Critic, PPO), but it does not provide specific version numbers for these software components or libraries, which is necessary for reproducibility.
Experiment Setup No The paper mentions the number of expert trajectories (Nexp) and replay rollouts (Nreplay), but it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations.