Learning in Observable POMDPs, without Computationally Intractable Oracles

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work provides an affirmative answer to the questions above: we give an algorithm (Ba Se CAMP, Algorithm 3) with quasi-polynomial time (and sample) complexity for learning near-optimal policies in observable POMDPs see Theorem 3.1. While this falls just short of polynomial time, it turns out to be optimal in the sense that even for observable POMDPs there is a quasi-polynomial time lower bound for the (simpler) planning problem under standard complexity assumptions [GMR22]. (3.a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] (3.b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] (3.c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [N/A] (3.d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]
Researcher Affiliation Academia Noah Golowich MIT nzg@mit.edu Ankur Moitra MIT moitra@mit.edu Dhruv Rohatgi MIT drohatgi@mit.edu
Pseudocode Yes Algorithm 1 Construct MDP; Algorithm 2 Bary Spanner Policy; Algorithm 3 Ba Se CAMP (Barycentric Spanner policy Cover with Approximate MDP)
Open Source Code No The paper is theoretical and focuses on algorithm design and proofs. There is no mention of open-source code for the described methodology. The author checklist section 3.a ('Did you include the code...') is marked N/A.
Open Datasets No The paper is theoretical and does not involve empirical studies with datasets. Therefore, no information regarding publicly available datasets is provided. The author checklist section 3.b ('Did you specify all the training details...') is marked N/A.
Dataset Splits No The paper is theoretical and does not describe any empirical experiments or dataset usage, hence no training, validation, or test splits are mentioned. The author checklist section 3.b ('Did you specify all the training details...') is marked N/A.
Hardware Specification No The paper is theoretical and does not describe empirical experiments. Consequently, no hardware specifications are mentioned for running any experiments. The author checklist section 3.d ('Did you include the total amount of compute...') is marked N/A.
Software Dependencies No The paper is theoretical and does not describe empirical experiments or their implementation details. Therefore, no specific software dependencies with version numbers are mentioned. The author checklist section 3.b ('Did you specify all the training details...') is marked N/A.
Experiment Setup No The paper is theoretical and does not include an experimental setup section with details like hyperparameters or training configurations for empirical evaluation. The author checklist section 3.b ('Did you specify all the training details...') is marked N/A.