Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings
Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider experiments using grid-world environments where we observe noisy observations of the latent state due to imperfect sensors in Section G. As mentioned in Section 1, this experiment is motivated by practical scenarios in autonomous driving. Similar experimental settings are considered in Du et al. (2019). We demonstrate our proposed method can return the optimal policy with low sample complexity. |
| Researcher Affiliation | Academia | 1Cornell University 2Princeton University. |
| Pseudocode | Yes | Algorithm 1 Efficient Q-learning for Deterministic POMDPs (EQDP) ... Algorithm 2 Compute-V |
| Open Source Code | No | The paper provides a link to an external environment implementation ('https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py') but does not state that the code for the methodology described in this paper is open-source or provide a link to it. |
| Open Datasets | Yes | We consider the environment, cliff walking as illustrated in Figure 2a. The implementation of the environment is given in https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py. |
| Dataset Splits | No | The paper describes an RL environment (cliff walking) and the experimental setup for an agent interacting with it, but it does not specify explicit training/test/validation dataset splits in terms of percentages or sample counts, as it's not a pre-collected static dataset. |
| Hardware Specification | No | The paper discusses experiments in Section G but does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing resources used. |
| Software Dependencies | No | The paper mentions the use of 'openai/gym' for the environment implementation, but it does not provide specific version numbers for this or any other software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Our proposal. We set M = 50, ϵ = 0.6, λ = 0.1 unless otherwise noted. We confirm our proposal is robust to hyperparameters as will be discussed later. ... We set H = 20 and vary α (0, 0.1, 0.2, 0.3, 0.4). ... We set α = 0.3 and vary H (5, 12, 20, 30, 40). ... We set H = 20, α = 0.3. We vary M (20, 50, 100, 200). |