reproducibilityindex.ai

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. ... we demonstrate the scalability and generality of our PO-bilinear actor-critic framework by showing PAC-guarantee on many models as follows (see Table 1 for a summary).Theorem 1 (PAC guarantee of PROVABLE).Theorem 2 (Sample complexity for discrete Π and G (informal)).Theorem 3 (Sample complexity for undercomplete tabular models (Informal)).Theorem 4 (Sample complexity for undercomplete tabular models (Informal) competing against π gl).Theorem 5 (Sample complexity for LQG (informal) competing against π gl).Theorem 6 (Sample complexity for HSE-POMDPs (Informal)).
Researcher Affiliation	Academia	Masatoshi Uehara Cornell University mu223@cornell.edu Ayush Sekhari MIT sekhari@mit.edu Nathan Kallus Cornell University kallus@cornell.edu Jason D. Lee Princeton University jasonlee@princeton.edu Wen Sun Cornell University ws455@cornell.edu
Pseudocode	Yes	Algorithm 1 Pa Rtially Obser VAble Bi Lin Ear (PROVABLE)
Open Source Code	No	The paper does not mention releasing open-source code.
Open Datasets	No	The paper is theoretical and does not describe experiments using specific, publicly available datasets.
Dataset Splits	No	The paper is theoretical and does not discuss dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe experiments, thus no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and does not specify software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe specific experimental setups, hyperparameters, or training configurations.