reproducibilityindex.ai

Learning Belief Representations for Partially Observable Deep RL

Authors: Andrew Wang, Andrew C Li, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. Mcilraith

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.We demonstrate the advantages of our method through experiments and ablations on image-based Mini Grid environments (Chevalier-Boisvert et al., 2018) as well as a continuous-control environment with high-dimensional image observations.
Researcher Affiliation	Academia	1Department of Computer Science, University of Toronto 2Vector Institute 3Schwartz Reisman Institute for Technology and Society 4Pontificia Universidad Cat olica de Chile 5Centro Nacional de Inteligencia Artificial.
Pseudocode	Yes	Algorithm 1 Learning compact state representations.
Open Source Code	Yes	Code available at https://github.com/awwang10/sphinx.
Open Datasets	No	For each evaluation environment, we collect a small amount of offline data from a random-action policy. We use this dataset in Believer to learn state representations (Section 4.1) and to pretrain the belief state VAE (Section 4.2).The paper uses custom-built environments (Sphinx, Cookie, Escape Room) and collects its own data, without providing explicit access information for the collected dataset itself.
Dataset Splits	No	The paper mentions collecting 'offline data from a random-action policy' for pretraining but does not specify train/validation/test splits for this collected data. Hyperparameter tables mention 'Minibatch Size' but not dataset splits.
Hardware Specification	No	The paper mentions 'GPU memory' in section 5.1 but does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or memory amounts used for experiments.
Software Dependencies	No	The paper does not list specific software dependencies along with their version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1') that would be required for replication.
Experiment Setup	Yes	Appendix C. Hyperparameters provides detailed tables (Table 1, 2, 3, 4, 5) listing specific hyperparameter values for each environment, including discount factor, learning rates, batch sizes, number of epochs, various loss coefficients, and latent dimensions.