Learning in POMDPs is Sample-Efficient with Hindsight Observability
Authors: Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. We give a lower bound showing that the tabular algorithm is optimal in its dependence on latent state and observation cardinalities. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Google Research 3HKUST. |
| Pseudocode | Yes | Algorithm 1 Hindsight OPtimism with Bonus (HOP-B) ... Algorithm 2 Hindsight OPtimism with Version spaces(HOP-V) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-sourcing code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on empirical experiments using specific datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not report on empirical experiments, thus no dataset splits for validation are specified. |
| Hardware Specification | No | The paper is theoretical and does not report on empirical experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not report on empirical experiments, thus no software dependencies with version numbers are specified. |
| Experiment Setup | No | The paper is theoretical and does not report on empirical experiments, thus no experimental setup details like hyperparameters are provided. |