Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Batch Reinforcement Learning with Hyperparameter Gradients
Authors: Byungjun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement. |
| Researcher Affiliation | Collaboration | 1School of Computing, KAIST, Daejeon, South Korea 2PROWLER.io 3Graduate School of AI, KAIST, Daejeon, South Korea. |
| Pseudocode | No | The paper describes the algorithms and procedures in paragraph text without a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper states "We used their published code and hyperparameters (Φ = 0.05 for BCQ and ϵ = 0.05 for BEAR-QL) therein for obtaining experimental results," referring to third-party code, but does not provide concrete access to their own source code for BOPAH/AC-BOPAH. |
| Open Datasets | Yes | In this experiment, we evaluate the effectiveness of AC-BOPAH on continuous control tasks, using the MuJoCo environments in the Open AI gym (Todorov et al., 2012; Brockman et al., 2016). |
| Dataset Splits | Yes | BOPAH starts by dividing the entire batch data D = {(si, ai, s i, ri)}N i=1 into two mutually exclusive sets Dtrain and Dvalid. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, memory) used for experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions software like "Open AI gym" and algorithms such as SAC, BCQ, and BEAR-QL, but does not provide specific version numbers for these or other software dependencies like deep learning frameworks. |
| Experiment Setup | Yes | We used their published code and hyperparameters (Φ = 0.05 for BCQ and ϵ = 0.05 for BEAR-QL) therein for obtaining experimental results. |