reproducibilityindex.ai

Self-correcting Q-learning

Authors: Rong Zhu, Mattia Rigotti11185-11192

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6 we show the results of several experiments empirically examining these algorithms. We compare in simulations the performance of several algorithms: Q-learning, Double Q-learning, and our Self-correcting Q-learning (denoted as SCQ in the figures), with =1, 2, 4.
Researcher Affiliation	Collaboration	Rong Zhu1 , Mattia Rigotti 2 1 ISTBI, Fudan University 2 IBM Research AI rongzhu56@gmail.com, mr2666@columbia.edu
Pseudocode	Yes	Algorithm 1: Self-correcting Q-learning.
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to such code. It only references a third-party tool's repository ('Tworek, J. 2018. vel (candidate-v0.4 accessed 2020-02-21). https://github.com/Million Integrals/vel.').
Open Datasets	Yes	A testbed that has become standard for DQN is the Atari 2600 domain popularized by the ALE Environment (Bellemare et al. 2013), that we’ll examine in the Experiments section.
Dataset Splits	No	The paper mentions using 'Atari 2600 domain' and specific games, but does not explicitly provide specific percentages or counts for training, validation, and test splits for these datasets. It discusses training parameters and multiple runs, but not data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	Yes	We trained the same architecture presented in (Mnih et al. 2015) as implemented in Vel (0.4 candidate version, (Tworek 2018)).
Experiment Setup	Yes	Parameter settings are ϵ = 0.1, α = 0.1, and γ = 1. The parameter ϵ starts off a 1.0 and is linearly decreased to 0.1 over 1M simulation steps, while β is kept constant throughout.