Ensemble Bootstrapping for Q-Learning

Authors: Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games. In this section, we present two main experimental results of EBQL compared to QL and DQL in both a tabular setting and on the ATARI ALE (Bellemare et al., 2013) using the deep RL variants.
Researcher Affiliation Academia 1Viterbi Faculty of Electrical Engineering, Technion Institute of Technology, Haifa, Israel.
Pseudocode Yes Algorithm 1 Ensemble Bootstrapped Q-Learning (EBQL)
Open Source Code No The paper does not provide a direct link to the source code for the methodology described in this paper or explicitly state its availability.
Open Datasets Yes Here, we evaluate EBQL in a high dimensional task ATARI ALE (Bellemare et al., 2013).
Dataset Splits No The paper mentions training over 50M steps but does not specify dataset splits (e.g., percentages or counts for training, validation, and test sets) for data partitioning.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers needed to replicate the experiments.
Experiment Setup No All hyper-parameters are identical to the baselines, as reported in (Mnih et al., 2015), including the use of target networks (see Appendix E for pseudo-codes).