On Q-learning Convergence for Non-Markov Decision Processes
Authors: Sultan Javed Majeed, Marcus Hutter
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 7 we numerically evaluate Q-learning on a few non-MDP toy domains. |
| Researcher Affiliation | Academia | Research School of Computer Science, Australian National University, Australia |
| Pseudocode | No | The information is insufficient. The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The information is insufficient. The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The information is insufficient. The paper describes toy examples (Non-Markovian Reward Process, Non-Markovian Decision Process) but does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The information is insufficient. The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology). |
| Hardware Specification | No | The information is insufficient. The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The information is insufficient. The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The learning curves of Q-learning are averaged over 40 independent runs with the parameters, γ = 0.9, q0(s = 0, x) = 8 and q0(s = 1, x) = 3. The learning curves of Q-learning are averaged over 50 independent runs with the parameters, γ = 0.9, pmin = 0.01 and q0(s, a) = 0 for all s and a. |