reproducibilityindex.ai

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider experiments using grid-world environments where we observe noisy observations of the latent state due to imperfect sensors in Section G. As mentioned in Section 1, this experiment is motivated by practical scenarios in autonomous driving. Similar experimental settings are considered in Du et al. (2019). We demonstrate our proposed method can return the optimal policy with low sample complexity.
Researcher Affiliation	Academia	1Cornell University 2Princeton University.
Pseudocode	Yes	Algorithm 1 Efficient Q-learning for Deterministic POMDPs (EQDP) ... Algorithm 2 Compute-V
Open Source Code	No	The paper provides a link to an external environment implementation ('https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py') but does not state that the code for the methodology described in this paper is open-source or provide a link to it.
Open Datasets	Yes	We consider the environment, cliff walking as illustrated in Figure 2a. The implementation of the environment is given in https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py.
Dataset Splits	No	The paper describes an RL environment (cliff walking) and the experimental setup for an agent interacting with it, but it does not specify explicit training/test/validation dataset splits in terms of percentages or sample counts, as it's not a pre-collected static dataset.
Hardware Specification	No	The paper discusses experiments in Section G but does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing resources used.
Software Dependencies	No	The paper mentions the use of 'openai/gym' for the environment implementation, but it does not provide specific version numbers for this or any other software dependencies like programming languages or libraries.
Experiment Setup	Yes	Our proposal. We set M = 50, ϵ = 0.6, λ = 0.1 unless otherwise noted. We confirm our proposal is robust to hyperparameters as will be discussed later. ... We set H = 20 and vary α (0, 0.1, 0.2, 0.3, 0.4). ... We set α = 0.3 and vary H (5, 12, 20, 30, 40). ... We set H = 20, α = 0.3. We vary M (20, 50, 100, 200).