reproducibilityindex.ai

Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates

Authors: Jason Pazis, Ronald Parr

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present a new, efﬁcient PAC optimal exploration algorithm that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a signiﬁcant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present TCE, a new, ﬁne grained metric for the cost of exploration. From Lemmas 8.1, 8.2, 8.3, and 8.4, and theorem 8.5 below, we have that the combination of Algorithms 1 and 2 is efﬁcient PAC-MDP. Due to space constrains proofs are deferred to the appendix, available at the authors websites.
Researcher Affiliation	Academia	Jason Pazis Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge, MA 02139 jpazis@mit.edu Ronald Parr Department of Computer Science Duke University Durham, NC 27708 parr@cs.duke.edu
Pseudocode	Yes	Algorithm 1 Policy execution (kp instantiations, one for each concurrent MDP): Algorithm 2 Learner
Open Source Code	No	The paper is theoretical and does not mention releasing open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not describe training models on specific datasets.
Dataset Splits	No	The paper is theoretical and does not describe dataset splits for empirical validation.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe specific experimental setup details, hyperparameters, or training configurations.