ConQUR: Mitigating Delusional Bias in Deep Q-Learning
Authors: Dijia Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically. |
| Researcher Affiliation | Collaboration | 1Google Research, Mountain View, California, USA 2Department of Electrical Engineering, Princeton University, New Jersey, USA 3Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada. |
| Pseudocode | Yes | Algorithm 1 CONQUR SEARCH (Generic, depth-first); Algorithm 2 Modified Beam Search Instantiation of CONQUR Algorithm |
| Open Source Code | No | The paper mentions using an "open-source implementation of DQN and DDQN, with default hyperparameters (Guadarrama et al., 2018)" and that pre-trained networks are obtained using the "Dopamine package (Castro et al., 2018)", but it does not state that the authors are releasing their own code for the CONQUR framework or experiments. |
| Open Datasets | Yes | We assess the performance of CONQUR using the Atari test suite (Bellemare et al., 2013). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test splits, such as specific percentages, counts, or references to predefined splits, beyond mentioning the 'Atari test suite'. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Dopamine package (Castro et al., 2018)" and an "open-source implementation of DQN and DDQN", but it does not specify version numbers for these or other software components. |
| Experiment Setup | Yes | We train models using an open-source implementation of DQN and DDQN, with default hyperparameters (Guadarrama et al., 2018). We evaluate DQN(λ) and DDQN(λ) for λ {0.25, 0.5, 1, 1.5, 2}. The annealing schedule is λt = λfinalt/(t+2 106). We use a splitting factor of c = 4 and frontier size 16. The dive phase is always of length nine. |