BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Authors: Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For the Mu Jo Co benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Qlearning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL s performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.
Researcher Affiliation Academia 1 New York University Shanghai 2New York University
Pseudocode Yes We provide detailed pseudo-code for both BAIL and Progressive BAIL in the supplementary materials.
Open Source Code Yes We provide public open source code for reproducibility 2. 2https://github.com/lanyavik/BAIL
Open Datasets Yes Because the BCQ and BEAR codes are publicly available, we are able to make a careful and comprehensive comparison of the performance of BAIL, BCQ, BEAR, MARWIL and vanilla Behavior Cloning (BC) using the Mujoco benchmark. We will also make our datasets publicly available for future benchmarking.
Dataset Splits Yes We split the data into a training set and validation set.
Hardware Specification No The paper mentions running experiments "on a CPU node" but does not provide specific details such as CPU model, number of cores, GPU models, or memory specifications.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python version, PyTorch/TensorFlow version, specific Reinforcement Learning frameworks).
Experiment Setup Yes For a fair comparison, we keep all hyper-parameters fixed for all experiments, instead of fine-tuning for each one. In practice, we find K = 1000 works well for all environments tested. In this paper we use p = 25% for all environments and batches.