Batch Reinforcement Learning with Hyperparameter Gradients
Authors: Byungjun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement. |
| Researcher Affiliation | Collaboration | 1School of Computing, KAIST, Daejeon, South Korea 2PROWLER.io 3Graduate School of AI, KAIST, Daejeon, South Korea. |
| Pseudocode | No | The paper describes the algorithms and procedures in paragraph text without a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper states "We used their published code and hyperparameters (Φ = 0.05 for BCQ and ϵ = 0.05 for BEAR-QL) therein for obtaining experimental results," referring to third-party code, but does not provide concrete access to their own source code for BOPAH/AC-BOPAH. |
| Open Datasets | Yes | In this experiment, we evaluate the effectiveness of AC-BOPAH on continuous control tasks, using the MuJoCo environments in the Open AI gym (Todorov et al., 2012; Brockman et al., 2016). |
| Dataset Splits | Yes | BOPAH starts by dividing the entire batch data D = {(si, ai, s i, ri)}N i=1 into two mutually exclusive sets Dtrain and Dvalid. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, memory) used for experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions software like "Open AI gym" and algorithms such as SAC, BCQ, and BEAR-QL, but does not provide specific version numbers for these or other software dependencies like deep learning frameworks. |
| Experiment Setup | Yes | We used their published code and hyperparameters (Φ = 0.05 for BCQ and ϵ = 0.05 for BEAR-QL) therein for obtaining experimental results. |