Learning Belief Representations for Partially Observable Deep RL

Authors: Andrew Wang, Andrew C Li, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. Mcilraith

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.We demonstrate the advantages of our method through experiments and ablations on image-based Mini Grid environments (Chevalier-Boisvert et al., 2018) as well as a continuous-control environment with high-dimensional image observations.
Researcher Affiliation Academia 1Department of Computer Science, University of Toronto 2Vector Institute 3Schwartz Reisman Institute for Technology and Society 4Pontificia Universidad Cat olica de Chile 5Centro Nacional de Inteligencia Artificial.
Pseudocode Yes Algorithm 1 Learning compact state representations.
Open Source Code Yes Code available at https://github.com/awwang10/sphinx.
Open Datasets No For each evaluation environment, we collect a small amount of offline data from a random-action policy. We use this dataset in Believer to learn state representations (Section 4.1) and to pretrain the belief state VAE (Section 4.2).The paper uses custom-built environments (Sphinx, Cookie, Escape Room) and collects its own data, without providing explicit access information for the collected dataset itself.
Dataset Splits No The paper mentions collecting 'offline data from a random-action policy' for pretraining but does not specify train/validation/test splits for this collected data. Hyperparameter tables mention 'Minibatch Size' but not dataset splits.
Hardware Specification No The paper mentions 'GPU memory' in section 5.1 but does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or memory amounts used for experiments.
Software Dependencies No The paper does not list specific software dependencies along with their version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1') that would be required for replication.
Experiment Setup Yes Appendix C. Hyperparameters provides detailed tables (Table 1, 2, 3, 4, 5) listing specific hyperparameter values for each environment, including discount factor, learning rates, batch sizes, number of epochs, various loss coefficients, and latent dimensions.