Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. ... we demonstrate the scalability and generality of our PO-bilinear actor-critic framework by showing PAC-guarantee on many models as follows (see Table 1 for a summary).Theorem 1 (PAC guarantee of PROVABLE).Theorem 2 (Sample complexity for discrete Π and G (informal)).Theorem 3 (Sample complexity for undercomplete tabular models (Informal)).Theorem 4 (Sample complexity for undercomplete tabular models (Informal) competing against π gl).Theorem 5 (Sample complexity for LQG (informal) competing against π gl).Theorem 6 (Sample complexity for HSE-POMDPs (Informal)).
Researcher Affiliation Academia Masatoshi Uehara Cornell University mu223@cornell.edu Ayush Sekhari MIT sekhari@mit.edu Nathan Kallus Cornell University kallus@cornell.edu Jason D. Lee Princeton University jasonlee@princeton.edu Wen Sun Cornell University ws455@cornell.edu
Pseudocode Yes Algorithm 1 Pa Rtially Obser VAble Bi Lin Ear (PROVABLE)
Open Source Code No The paper does not mention releasing open-source code.
Open Datasets No The paper is theoretical and does not describe experiments using specific, publicly available datasets.
Dataset Splits No The paper is theoretical and does not discuss dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe experiments, thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not specify software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe specific experimental setups, hyperparameters, or training configurations.