Self-correcting Q-learning

Authors: Rong Zhu, Mattia Rigotti11185-11192

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6 we show the results of several experiments empirically examining these algorithms. We compare in simulations the performance of several algorithms: Q-learning, Double Q-learning, and our Self-correcting Q-learning (denoted as SCQ in the figures), with =1, 2, 4.
Researcher Affiliation Collaboration Rong Zhu1 , Mattia Rigotti 2 1 ISTBI, Fudan University 2 IBM Research AI rongzhu56@gmail.com, mr2666@columbia.edu
Pseudocode Yes Algorithm 1: Self-correcting Q-learning.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to such code. It only references a third-party tool's repository ('Tworek, J. 2018. vel (candidate-v0.4 accessed 2020-02-21). https://github.com/Million Integrals/vel.').
Open Datasets Yes A testbed that has become standard for DQN is the Atari 2600 domain popularized by the ALE Environment (Bellemare et al. 2013), that we’ll examine in the Experiments section.
Dataset Splits No The paper mentions using 'Atari 2600 domain' and specific games, but does not explicitly provide specific percentages or counts for training, validation, and test splits for these datasets. It discusses training parameters and multiple runs, but not data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies Yes We trained the same architecture presented in (Mnih et al. 2015) as implemented in Vel (0.4 candidate version, (Tworek 2018)).
Experiment Setup Yes Parameter settings are ϵ = 0.1, α = 0.1, and γ = 1. The parameter ϵ starts off a 1.0 and is linearly decreased to 0.1 over 1M simulation steps, while β is kept constant throughout.