Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Convergence of a Q-learning Variant for Continuous States and Actions

Authors: S. W. Carden

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents a reinforcement learning algorithm for solving inﬁnite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.
Researcher Affiliation	Academia	Stephen Carden EMAIL Department of Mathematical Sciences, Clemson University
Pseudocode	Yes	Algorithm 1 Pseudocode for theoretical algorithm Initialize h = bandwidth value, m = maximum iterations, γ = discount factor, ϵ = exploration parameter Initialize b Qh,0(s, a) = 0 (s, a) Set initial state s1 for i=1:m do r = Uniform(0,1) random value if r < ϵ then ai = random action else ai = supa A b Qh,n 1(si, a) end if ui = next state, ri = reward yh,i := ri + γ supa A b Qh,i 1(ui, a) b Qh,i(s, a) = Pi j=1 Kh((s,a) (sj,aj))yh,j Pi j=1 Kh((s,a) (sj,aj)) si+1 = ui end for
Open Source Code	Yes	1. For the full source code, see the online appendix associated with this publication.
Open Datasets	Yes	In this section we detail an application to the Mountain Car problem (Moore, 1991).
Dataset Splits	No	The paper describes the setup of the Mountain Car problem, which is a simulation environment, but it does not specify any training, validation, or test dataset splits in the traditional sense of partitioning a pre-collected dataset.
Hardware Specification	Yes	Implementation was in MATLAB 2012b in Ubuntu 12.04 on hardware with an Intel Xeon 3.47 gigahertz processor and 24 gigabytes of RAM.
Software Dependencies	Yes	Implementation was in MATLAB 2012b in Ubuntu 12.04 on hardware with an Intel Xeon 3.47 gigahertz processor and 24 gigabytes of RAM.
Experiment Setup	Yes	Parameter values were bandwidth h = .2, exploration parameter ϵ = .9, discount factor γ = .9, and k = 20 successful episodes to initialize with 1.