Off-Policy Deep Reinforcement Learning without Exploration

Authors: Scott Fujimoto, David Meger, Doina Precup

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.
Researcher Affiliation Academia 1Department of Computer Science, Mc Gill University, Montreal, Canada 2Mila Qu ebec AI Institute.
Pseudocode Yes Algorithm 1 BCQ
Open Source Code Yes To ensure reproducibility, we provide precise experimental and implementation details, and our code is made available (https://github.com/sfujim/BCQ).
Open Datasets Yes Our practical experiments examine three different batch settings in Open AI gym s Hopper-v1 environment (Todorov et al., 2012; Brockman et al., 2016)
Dataset Splits No The paper describes data collection and usage in different batch settings but does not explicitly provide train/validation/test dataset splits or cross-validation details.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Open AI gym' and 'Mu Jo Co environments' but does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup Yes Exact implementation and experimental details are provided in the Supplementary Material.