Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Addressing Environment Non-Stationarity by Repeating Q-learning Updates

Authors: Sherief Abdallah, Michael Kaisers

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments.
Researcher Affiliation	Academia	Sherief Abdallah EMAIL The British University in Dubai, P.O.Box 345015, Block 11, DIAC, Dubai, United Arab Emirates University of Edinburgh, United Kingdom Michael Kaisers EMAIL Centrum Wiskunde & Informatica, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands Maastricht University, The Netherlands
Pseudocode	Yes	Algorithm 1: Q-learning Algorithm Algorithm 2: RUQL (Impractical Implementation) Algorithm 3: Dyna-Q
Open Source Code	No	The paper does not explicitly state that source code for the methodology described is made publicly available or provide a link to a repository.
Open Datasets	No	The paper uses variations of standard domains such as the multi-armed bandit problem, Prisoner's Dilemma, 1-Row Domain, Taxi Domain (Dietterich, 2000), and Mines Domain (RL-GLUE software (Tanner and White, 2009)). While these are established domains or software, the paper describes its own specific variations and does not provide concrete access information (e.g., links, specific dataset files, DOIs) for these modified environments or any generated data.
Dataset Splits	No	The paper describes experimental simulations in various domains, often running for a fixed number of time steps (e.g., "average payoff over 2000 consecutive time steps", "1200000 consecutive time steps"). These are simulation environments and games, not typical supervised learning datasets that would have explicit training, validation, and test splits.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions "RL-GLUE software (Tanner and White, 2009)" for the Mines domain, but it does not specify a version number for RL-GLUE or any other software libraries, frameworks, or programming languages with their versions.
Experiment Setup	Yes	We conducted the experiments over a range of parameter values, including D {250, 500, }, p0, p1 {0, 0.003, 0.01, 0.03, 0.1}, τ {0.1, 0.3, 1}, α {0.001, 0.01, 0.1, 1}, and γ = 0. For FAQL, we used the square root of these α values for both αF AQ and βF AQ, so that FAQL has an effective learning rate αF AQ = βF AQ = α.