Ensemble Bootstrapping for Q-Learning
Authors: Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games. In this section, we present two main experimental results of EBQL compared to QL and DQL in both a tabular setting and on the ATARI ALE (Bellemare et al., 2013) using the deep RL variants. |
| Researcher Affiliation | Academia | 1Viterbi Faculty of Electrical Engineering, Technion Institute of Technology, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 Ensemble Bootstrapped Q-Learning (EBQL) |
| Open Source Code | No | The paper does not provide a direct link to the source code for the methodology described in this paper or explicitly state its availability. |
| Open Datasets | Yes | Here, we evaluate EBQL in a high dimensional task ATARI ALE (Bellemare et al., 2013). |
| Dataset Splits | No | The paper mentions training over 50M steps but does not specify dataset splits (e.g., percentages or counts for training, validation, and test sets) for data partitioning. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers needed to replicate the experiments. |
| Experiment Setup | No | All hyper-parameters are identical to the baselines, as reported in (Mnih et al., 2015), including the use of target networks (see Appendix E for pseudo-codes). |