Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Addressing Environment Non-Stationarity by Repeating Q-learning Updates

Authors: Sherief Abdallah, Michael Kaisers

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments.
Researcher Affiliation Academia Sherief Abdallah EMAIL The British University in Dubai, P.O.Box 345015, Block 11, DIAC, Dubai, United Arab Emirates University of Edinburgh, United Kingdom Michael Kaisers EMAIL Centrum Wiskunde & Informatica, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands Maastricht University, The Netherlands
Pseudocode Yes Algorithm 1: Q-learning Algorithm Algorithm 2: RUQL (Impractical Implementation) Algorithm 3: Dyna-Q
Open Source Code No The paper does not explicitly state that source code for the methodology described is made publicly available or provide a link to a repository.
Open Datasets No The paper uses variations of standard domains such as the multi-armed bandit problem, Prisoner's Dilemma, 1-Row Domain, Taxi Domain (Dietterich, 2000), and Mines Domain (RL-GLUE software (Tanner and White, 2009)). While these are established domains or software, the paper describes its own specific variations and does not provide concrete access information (e.g., links, specific dataset files, DOIs) for these modified environments or any generated data.
Dataset Splits No The paper describes experimental simulations in various domains, often running for a fixed number of time steps (e.g., "average payoff over 2000 consecutive time steps", "1200000 consecutive time steps"). These are simulation environments and games, not typical supervised learning datasets that would have explicit training, validation, and test splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "RL-GLUE software (Tanner and White, 2009)" for the Mines domain, but it does not specify a version number for RL-GLUE or any other software libraries, frameworks, or programming languages with their versions.
Experiment Setup Yes We conducted the experiments over a range of parameter values, including D {250, 500, }, p0, p1 {0, 0.003, 0.01, 0.03, 0.1}, τ {0.1, 0.3, 1}, α {0.001, 0.01, 0.1, 1}, and γ = 0. For FAQL, we used the square root of these α values for both αF AQ and βF AQ, so that FAQL has an effective learning rate αF AQ = βF AQ = α.