ConQUR: Mitigating Delusional Bias in Deep Q-Learning

Authors: Dijia Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.
Researcher Affiliation Collaboration 1Google Research, Mountain View, California, USA 2Department of Electrical Engineering, Princeton University, New Jersey, USA 3Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
Pseudocode Yes Algorithm 1 CONQUR SEARCH (Generic, depth-first); Algorithm 2 Modified Beam Search Instantiation of CONQUR Algorithm
Open Source Code No The paper mentions using an "open-source implementation of DQN and DDQN, with default hyperparameters (Guadarrama et al., 2018)" and that pre-trained networks are obtained using the "Dopamine package (Castro et al., 2018)", but it does not state that the authors are releasing their own code for the CONQUR framework or experiments.
Open Datasets Yes We assess the performance of CONQUR using the Atari test suite (Bellemare et al., 2013).
Dataset Splits No The paper does not explicitly provide details about training/validation/test splits, such as specific percentages, counts, or references to predefined splits, beyond mentioning the 'Atari test suite'.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions using "Dopamine package (Castro et al., 2018)" and an "open-source implementation of DQN and DDQN", but it does not specify version numbers for these or other software components.
Experiment Setup Yes We train models using an open-source implementation of DQN and DDQN, with default hyperparameters (Guadarrama et al., 2018). We evaluate DQN(λ) and DDQN(λ) for λ {0.25, 0.5, 1, 1.5, 2}. The annealing schedule is λt = λfinalt/(t+2 106). We use a splitting factor of c = 4 and frontier size 16. The dive phase is always of length nine.