A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
Authors: Donghwan Lee, Niao He
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulated trajectories of the O.D.E. model of Q-learning including the upper and lower comparison systems are depicted in Figure 1. Simulated trajectories of the O.D.E. model of the averaging Qlearning including the upper and lower comparison systems are depicted in Figure 2 for QA t part. The simulation study empirically justifies the bounding principles and asymptotic convergence established in theory. |
| Researcher Affiliation | Academia | Donghwan Lee Korea Advanced Institute of Science and Technology donghwan@kaist.ac.kr Niao He UIUC & ETH Zurich niao.he@inf.ethz.ch |
| Pseudocode | No | The paper describes mathematical updates for algorithms but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the release of open-source code for its methodology. |
| Open Datasets | No | The paper defines an MDP example directly in the text for numerical simulations (e.g., "Consider an MDP with S = {1, 2}, A = {1, 2}, γ = 0.9, P1 = 0.2 0.8 0.3 0.7 , P2 = 0.5 0.5 0.7 0.3 "), which is a synthetic example and not referred to as a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not use standard train/validation/test splits as it focuses on theoretical analysis and numerical simulations of ODE models rather than empirical evaluation on a traditional dataset. The MDP used in simulations is defined in the text, not split. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for the numerical simulations. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used for its numerical simulations. |
| Experiment Setup | Yes | Consider an MDP with S = {1, 2}, A = {1, 2}, γ = 0.9, P1 = 0.2 0.8 0.3 0.7 , P2 = 0.5 0.5 0.7 0.3 and a behavior policy β such that P[a = 1|s = 1] = 0.2, P[a = 2|s = 1] = 0.8, P[a = 1|s = 2] = 0.7, P[a = 2|s = 2] = 0.3. |