reproducibilityindex.ai

On Q-learning Convergence for Non-Markov Decision Processes

Authors: Sultan Javed Majeed, Marcus Hutter

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 7 we numerically evaluate Q-learning on a few non-MDP toy domains.
Researcher Affiliation	Academia	Research School of Computer Science, Australian National University, Australia
Pseudocode	No	The information is insufficient. The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The information is insufficient. The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The information is insufficient. The paper describes toy examples (Non-Markovian Reward Process, Non-Markovian Decision Process) but does not provide concrete access information for a publicly available or open dataset.
Dataset Splits	No	The information is insufficient. The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology).
Hardware Specification	No	The information is insufficient. The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The information is insufficient. The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The learning curves of Q-learning are averaged over 40 independent runs with the parameters, γ = 0.9, q0(s = 0, x) = 8 and q0(s = 1, x) = 3. The learning curves of Q-learning are averaged over 50 independent runs with the parameters, γ = 0.9, pmin = 0.01 and q0(s, a) = 0 for all s and a.