Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider experiments using grid-world environments where we observe noisy observations of the latent state due to imperfect sensors in Section G. As mentioned in Section 1, this experiment is motivated by practical scenarios in autonomous driving. Similar experimental settings are considered in Du et al. (2019). We demonstrate our proposed method can return the optimal policy with low sample complexity.
Researcher Affiliation Academia 1Cornell University 2Princeton University.
Pseudocode Yes Algorithm 1 Efficient Q-learning for Deterministic POMDPs (EQDP) ... Algorithm 2 Compute-V
Open Source Code No The paper provides a link to an external environment implementation ('https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py') but does not state that the code for the methodology described in this paper is open-source or provide a link to it.
Open Datasets Yes We consider the environment, cliff walking as illustrated in Figure 2a. The implementation of the environment is given in https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py.
Dataset Splits No The paper describes an RL environment (cliff walking) and the experimental setup for an agent interacting with it, but it does not specify explicit training/test/validation dataset splits in terms of percentages or sample counts, as it's not a pre-collected static dataset.
Hardware Specification No The paper discusses experiments in Section G but does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing resources used.
Software Dependencies No The paper mentions the use of 'openai/gym' for the environment implementation, but it does not provide specific version numbers for this or any other software dependencies like programming languages or libraries.
Experiment Setup Yes Our proposal. We set M = 50, ϵ = 0.6, λ = 0.1 unless otherwise noted. We confirm our proposal is robust to hyperparameters as will be discussed later. ... We set H = 20 and vary α (0, 0.1, 0.2, 0.3, 0.4). ... We set α = 0.3 and vary H (5, 12, 20, 30, 40). ... We set H = 20, α = 0.3. We vary M (20, 50, 100, 200).