Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Increasingly Cautious Optimism for Practical PAC-MDP Exploration

Authors: Liangpeng Zhang, Ke Tang, Xin Yao

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that both ICR and ICV are PACMDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their signiﬁcantly improved performance through empirical results.
Researcher Affiliation	Academia	Liangpeng Zhang1, Ke Tang1 and Xin Yao2 1UBRI, School of Computer Science and Technology, University of Science and Technology of China 2Cercia, School of Computer Science, University of Birmingham, United Kingdom
Pseudocode	Yes	The resulting pseudo-code for ICR is given in Algorithm 1. ... The resulting pseudo-code for ICV is given in Algorithm 2.
Open Source Code	No	The paper provides a link for 'Details' about generated mazes ('http://staff.ustc.edu.cn/~ketang/codes/IJCAI15ICO.html'), but this is not an explicit statement that the source code for the described methodology is released or available at this link.
Open Datasets	No	The paper describes conducting experiments in a 'Complex Maze' environment and generating mazes, but it does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available dataset used for training.
Dataset Splits	No	The paper describes a 'test process' carried out during learning but does not specify training, validation, or test dataset splits in percentages, counts, or by referencing predefined splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions 'Value Iteration' and a 'Bellman error threshold' but does not list any specific software libraries, frameworks, or their version numbers used in the implementation.
Experiment Setup	Yes	In our experiments, the average number of timesteps the agent needs to discover a near-optimal policy, rather than the average cumulative reward, is used as performance metric... If the agent fails to ﬁnd out a 0.1ρ -optimal policy within tmax = 300000 steps, then a timeout is reported... We used a continuing task setting with γ = 0.998. ... The threshold of the Bellman error in Value Iteration was set to 0.01. ... By trial-and-error on the parameters, we found that setting m = 5 for R-MAX and V-MAX produces best results in this learning task. ... The best parameter found for OIM is R0 = 0.05Rmax, and for Mo RMAX is m = 3. For ICR and ICV, although there seem to be three parameters, we found that a trivial setting of m0 = 2, mmax = tmax is sufﬁcient for all tasks in our experiments. Meanwhile, the best Δm found was 1/7000 for ICR and 1/5000 for ICV.