reproducibilityindex.ai

BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Authors: Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For the Mu Jo Co benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Qlearning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL s performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.
Researcher Affiliation	Academia	1 New York University Shanghai 2New York University
Pseudocode	Yes	We provide detailed pseudo-code for both BAIL and Progressive BAIL in the supplementary materials.
Open Source Code	Yes	We provide public open source code for reproducibility 2. 2https://github.com/lanyavik/BAIL
Open Datasets	Yes	Because the BCQ and BEAR codes are publicly available, we are able to make a careful and comprehensive comparison of the performance of BAIL, BCQ, BEAR, MARWIL and vanilla Behavior Cloning (BC) using the Mujoco benchmark. We will also make our datasets publicly available for future benchmarking.
Dataset Splits	Yes	We split the data into a training set and validation set.
Hardware Specification	No	The paper mentions running experiments "on a CPU node" but does not provide specific details such as CPU model, number of cores, GPU models, or memory specifications.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python version, PyTorch/TensorFlow version, specific Reinforcement Learning frameworks).
Experiment Setup	Yes	For a fair comparison, we keep all hyper-parameters ﬁxed for all experiments, instead of ﬁne-tuning for each one. In practice, we ﬁnd K = 1000 works well for all environments tested. In this paper we use p = 25% for all environments and batches.