reproducibilityindex.ai

On Oracle-Efficient PAC RL with Rich Observations

Authors: Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present new provably sample-efﬁcient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation accessing policy and value function classes exclusively through standard optimization primitives and therefore represent computationally efﬁcient alternatives to prior algorithms that require enumeration. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.
Researcher Affiliation	Collaboration	Christoph Dann Carnegie Mellon University Pittsburgh, Pennsylvania cdann@cdann.net UIUC Urbana, Illinois nanjiang@illinois.edu Akshay Krishnamurthy Microsoft Research New York, New York akshay@cs.umass.edu Alekh Agarwal Microsoft Research Redmond, Washington alekha@microsoft.com John Langford Microsoft Research New York, New York jcl@microsoft.com Robert E. Schapire Microsoft Research New York, New York schapire@microsoft.com
Pseudocode	Yes	Algorithm 1: Main Algorithm VALOR, Algorithm 2: Subroutine: Policy optimization with local values, Algorithm 3: Subroutine: DFS Learning of local values
Open Source Code	No	The paper does not provide any explicit statements or links for open-source code availability for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and proofs; it does not describe empirical experiments performed on a specific public dataset or provide access information for one.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments or dataset usage, thus no training/validation/test splits are provided.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for computations or experiments, consistent with its theoretical nature.
Software Dependencies	No	The paper mentions general optimization methods like 'standard continuous optimization methods' and 'The ellipsoid method [37]' but does not specify any particular software, libraries, or their version numbers used to implement or execute them.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design; it does not describe empirical experimental setups, hyperparameter values, or training configurations.