Self-correcting Q-learning
Authors: Rong Zhu, Mattia Rigotti11185-11192
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6 we show the results of several experiments empirically examining these algorithms. We compare in simulations the performance of several algorithms: Q-learning, Double Q-learning, and our Self-correcting Q-learning (denoted as SCQ in the figures), with =1, 2, 4. |
| Researcher Affiliation | Collaboration | Rong Zhu1 , Mattia Rigotti 2 1 ISTBI, Fudan University 2 IBM Research AI rongzhu56@gmail.com, mr2666@columbia.edu |
| Pseudocode | Yes | Algorithm 1: Self-correcting Q-learning. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to such code. It only references a third-party tool's repository ('Tworek, J. 2018. vel (candidate-v0.4 accessed 2020-02-21). https://github.com/Million Integrals/vel.'). |
| Open Datasets | Yes | A testbed that has become standard for DQN is the Atari 2600 domain popularized by the ALE Environment (Bellemare et al. 2013), that we’ll examine in the Experiments section. |
| Dataset Splits | No | The paper mentions using 'Atari 2600 domain' and specific games, but does not explicitly provide specific percentages or counts for training, validation, and test splits for these datasets. It discusses training parameters and multiple runs, but not data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | Yes | We trained the same architecture presented in (Mnih et al. 2015) as implemented in Vel (0.4 candidate version, (Tworek 2018)). |
| Experiment Setup | Yes | Parameter settings are ϵ = 0.1, α = 0.1, and γ = 1. The parameter ϵ starts off a 1.0 and is linearly decreased to 0.1 over 1M simulation steps, while β is kept constant throughout. |