reproducibilityindex.ai

Contextual Decision Processes with low Bellman rank are PAC-Learnable

Authors: Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our ﬁrst contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efﬁcient exploration for RL with function approximation. We prove that OLIVE performs sample-efﬁcient learning in CDPs with a small Bellman rank (Section 4.2).
Researcher Affiliation	Collaboration	1University of Michigan, Ann Arbor 2University of Massachusetts, Amherst 3Microsoft Research, New York.
Pseudocode	Yes	Pseudocode for our algorithm, OLIVE (Optimism Led Iterative Value-function Elimination), is displayed in Algorithm 1.
Open Source Code	No	The paper does not contain any explicit statements about providing open-source code or links to a code repository for the methodology described.
Open Datasets	No	The paper is theoretical and does not involve experimental evaluation on a specific dataset, therefore no information about public dataset access for training is provided.
Dataset Splits	No	The paper is theoretical and does not involve experimental evaluation on a specific dataset. Therefore, it does not provide details on training/validation/test dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not mention any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers for implementation or experimentation.
Experiment Setup	No	The paper is theoretical and does not provide practical experimental setup details, hyperparameters, or training configurations. It defines parameters for its algorithm (nest, neval, n, φ) derived from theoretical analysis, but these are not for empirical setup.