reproducibilityindex.ai

Minimax Optimal Online Imitation Learning via Replay Estimation

Authors: Gokul Swamy, Nived Rajaraman, Matt Peng, Sanjiban Choudhury, J. Bagnell, Steven Z. Wu, Jiantao Jiao, Kannan Ramchandran

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement multiple instantiations of our approach on several continuous control tasks and ﬁnd that we are able to signiﬁcantly improve policy performance across a variety of dataset sizes.
Researcher Affiliation	Collaboration	Gokul Swamy Carnegie Mellon University gswamy@cmu.edu Nived Rajaraman UC Berkeley nived.rajaraman@berkeley.edu Matthew Peng UC Berkeley Sanjiban Choudhury Cornell University J. Andrew Bagnell Aurora Innovation and Carnegie Mellon University Zhiwei Steven Wu Carnegie Mellon University Jiantao Jiao UC Berkeley Kannan Ramchandran UC Berkeley
Pseudocode	Yes	Algorithm 1: Replay Estimation (RE )
Open Source Code	Yes	We release our code at https://github.com/gkswamy98/replay_est.
Open Datasets	Yes	We now quantify the empirical beneﬁts of RE on several continuous control tasks from the the Py Bullet suite Coumans and Bai [2016 2019].
Dataset Splits	No	The paper mentions using a 'ﬁxed set of expert demonstrations' and 'relatively few demonstrations (Nexp 20)', but does not provide specific train/validation/test splits by percentage or sample count, nor does it refer to predefined splits from cited sources for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Py Bullet suite' and 'Stable-baselines3' and references various algorithms (e.g., Soft-Actor Critic, PPO), but it does not provide specific version numbers for these software components or libraries, which is necessary for reproducibility.
Experiment Setup	No	The paper mentions the number of expert trajectories (Nexp) and replay rollouts (Nreplay), but it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations.