reproducibilityindex.ai

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

Authors: Dijia Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.
Researcher Affiliation	Collaboration	1Google Research, Mountain View, California, USA 2Department of Electrical Engineering, Princeton University, New Jersey, USA 3Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
Pseudocode	Yes	Algorithm 1 CONQUR SEARCH (Generic, depth-first); Algorithm 2 Modiﬁed Beam Search Instantiation of CONQUR Algorithm
Open Source Code	No	The paper mentions using an "open-source implementation of DQN and DDQN, with default hyperparameters (Guadarrama et al., 2018)" and that pre-trained networks are obtained using the "Dopamine package (Castro et al., 2018)", but it does not state that the authors are releasing their own code for the CONQUR framework or experiments.
Open Datasets	Yes	We assess the performance of CONQUR using the Atari test suite (Bellemare et al., 2013).
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test splits, such as specific percentages, counts, or references to predefined splits, beyond mentioning the 'Atari test suite'.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions using "Dopamine package (Castro et al., 2018)" and an "open-source implementation of DQN and DDQN", but it does not specify version numbers for these or other software components.
Experiment Setup	Yes	We train models using an open-source implementation of DQN and DDQN, with default hyperparameters (Guadarrama et al., 2018). We evaluate DQN(λ) and DDQN(λ) for λ {0.25, 0.5, 1, 1.5, 2}. The annealing schedule is λt = λfinalt/(t+2 106). We use a splitting factor of c = 4 and frontier size 16. The dive phase is always of length nine.