Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. ... we demonstrate the scalability and generality of our PO-bilinear actor-critic framework by showing PAC-guarantee on many models as follows (see Table 1 for a summary).Theorem 1 (PAC guarantee of PROVABLE).Theorem 2 (Sample complexity for discrete Π and G (informal)).Theorem 3 (Sample complexity for undercomplete tabular models (Informal)).Theorem 4 (Sample complexity for undercomplete tabular models (Informal) competing against π gl).Theorem 5 (Sample complexity for LQG (informal) competing against π gl).Theorem 6 (Sample complexity for HSE-POMDPs (Informal)). |
| Researcher Affiliation | Academia | Masatoshi Uehara Cornell University mu223@cornell.edu Ayush Sekhari MIT sekhari@mit.edu Nathan Kallus Cornell University kallus@cornell.edu Jason D. Lee Princeton University jasonlee@princeton.edu Wen Sun Cornell University ws455@cornell.edu |
| Pseudocode | Yes | Algorithm 1 Pa Rtially Obser VAble Bi Lin Ear (PROVABLE) |
| Open Source Code | No | The paper does not mention releasing open-source code. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using specific, publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not specify software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experimental setups, hyperparameters, or training configurations. |