Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. ... we demonstrate the scalability and generality of our PO-bilinear actor-critic framework by showing PAC-guarantee on many models as follows (see Table 1 for a summary).Theorem 1 (PAC guarantee of PROVABLE).Theorem 2 (Sample complexity for discrete Π and G (informal)).Theorem 3 (Sample complexity for undercomplete tabular models (Informal)).Theorem 4 (Sample complexity for undercomplete tabular models (Informal) competing against π gl).Theorem 5 (Sample complexity for LQG (informal) competing against π gl).Theorem 6 (Sample complexity for HSE-POMDPs (Informal)). |
| Researcher Affiliation | Academia | Masatoshi Uehara Cornell University EMAIL Ayush Sekhari MIT EMAIL Nathan Kallus Cornell University EMAIL Jason D. Lee Princeton University EMAIL Wen Sun Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1 Pa Rtially Obser VAble Bi Lin Ear (PROVABLE) |
| Open Source Code | No | The paper does not mention releasing open-source code. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using specific, publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not specify software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experimental setups, hyperparameters, or training configurations. |