Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ensemble Bootstrapping for Q-Learning

Authors: Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games. In this section, we present two main experimental results of EBQL compared to QL and DQL in both a tabular setting and on the ATARI ALE (Bellemare et al., 2013) using the deep RL variants.
Researcher Affiliation Academia 1Viterbi Faculty of Electrical Engineering, Technion Institute of Technology, Haifa, Israel.
Pseudocode Yes Algorithm 1 Ensemble Bootstrapped Q-Learning (EBQL)
Open Source Code No The paper does not provide a direct link to the source code for the methodology described in this paper or explicitly state its availability.
Open Datasets Yes Here, we evaluate EBQL in a high dimensional task ATARI ALE (Bellemare et al., 2013).
Dataset Splits No The paper mentions training over 50M steps but does not specify dataset splits (e.g., percentages or counts for training, validation, and test sets) for data partitioning.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers needed to replicate the experiments.
Experiment Setup No All hyper-parameters are identical to the baselines, as reported in (Mnih et al., 2015), including the use of target networks (see Appendix E for pseudo-codes).