reproducibilityindex.ai

Batch Reinforcement Learning with Hyperparameter Gradients

Authors: Byungjun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by ﬁnding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement.
Researcher Affiliation	Collaboration	1School of Computing, KAIST, Daejeon, South Korea 2PROWLER.io 3Graduate School of AI, KAIST, Daejeon, South Korea.
Pseudocode	No	The paper describes the algorithms and procedures in paragraph text without a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper states "We used their published code and hyperparameters (Φ = 0.05 for BCQ and ϵ = 0.05 for BEAR-QL) therein for obtaining experimental results," referring to third-party code, but does not provide concrete access to their own source code for BOPAH/AC-BOPAH.
Open Datasets	Yes	In this experiment, we evaluate the effectiveness of AC-BOPAH on continuous control tasks, using the MuJoCo environments in the Open AI gym (Todorov et al., 2012; Brockman et al., 2016).
Dataset Splits	Yes	BOPAH starts by dividing the entire batch data D = {(si, ai, s i, ri)}N i=1 into two mutually exclusive sets Dtrain and Dvalid.
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, memory) used for experiments are provided in the paper.
Software Dependencies	No	The paper mentions software like "Open AI gym" and algorithms such as SAC, BCQ, and BEAR-QL, but does not provide specific version numbers for these or other software dependencies like deep learning frameworks.
Experiment Setup	Yes	We used their published code and hyperparameters (Φ = 0.05 for BCQ and ϵ = 0.05 for BEAR-QL) therein for obtaining experimental results.