Off-Policy Deep Reinforcement Learning without Exploration
Authors: Scott Fujimoto, David Meger, Doina Precup
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Mc Gill University, Montreal, Canada 2Mila Qu ebec AI Institute. |
| Pseudocode | Yes | Algorithm 1 BCQ |
| Open Source Code | Yes | To ensure reproducibility, we provide precise experimental and implementation details, and our code is made available (https://github.com/sfujim/BCQ). |
| Open Datasets | Yes | Our practical experiments examine three different batch settings in Open AI gym s Hopper-v1 environment (Todorov et al., 2012; Brockman et al., 2016) |
| Dataset Splits | No | The paper describes data collection and usage in different batch settings but does not explicitly provide train/validation/test dataset splits or cross-validation details. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Open AI gym' and 'Mu Jo Co environments' but does not provide specific version numbers for these or any other ancillary software dependencies. |
| Experiment Setup | Yes | Exact implementation and experimental details are provided in the Supplementary Material. |