reproducibilityindex.ai

A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms

Authors: Donghwan Lee, Niao He

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulated trajectories of the O.D.E. model of Q-learning including the upper and lower comparison systems are depicted in Figure 1. Simulated trajectories of the O.D.E. model of the averaging Qlearning including the upper and lower comparison systems are depicted in Figure 2 for QA t part. The simulation study empirically justifies the bounding principles and asymptotic convergence established in theory.
Researcher Affiliation	Academia	Donghwan Lee Korea Advanced Institute of Science and Technology donghwan@kaist.ac.kr Niao He UIUC & ETH Zurich niao.he@inf.ethz.ch
Pseudocode	No	The paper describes mathematical updates for algorithms but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements or links indicating the release of open-source code for its methodology.
Open Datasets	No	The paper defines an MDP example directly in the text for numerical simulations (e.g., "Consider an MDP with S = {1, 2}, A = {1, 2}, γ = 0.9, P1 = 0.2 0.8 0.3 0.7 , P2 = 0.5 0.5 0.7 0.3 "), which is a synthetic example and not referred to as a publicly available dataset with concrete access information.
Dataset Splits	No	The paper does not use standard train/validation/test splits as it focuses on theoretical analysis and numerical simulations of ODE models rather than empirical evaluation on a traditional dataset. The MDP used in simulations is defined in the text, not split.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for the numerical simulations.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used for its numerical simulations.
Experiment Setup	Yes	Consider an MDP with S = {1, 2}, A = {1, 2}, γ = 0.9, P1 = 0.2 0.8 0.3 0.7 , P2 = 0.5 0.5 0.7 0.3 and a behavior policy β such that P[a = 1\|s = 1] = 0.2, P[a = 2\|s = 1] = 0.8, P[a = 1\|s = 2] = 0.7, P[a = 2\|s = 2] = 0.3.