reproducibilityindex.ai

Flow-based Recurrent Belief State Learning for POMDPs

Authors: Xiaoyu Chen, Yao Mark Mu, Ping Luo, Shengbo Li, Jianyu Chen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that our methods successfully capture the complex belief states that enable multi-modal predictions as well as high quality reconstructions, and results on challenging visual-motor control tasks show that our method achieves superior performance and sample efficiency.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Computer Science, The University of Hong Kong 3School of Vehicle and Mobility, Tsinghua University 4Shanghai Qizhi Institute.
Pseudocode	Yes	Algorithm 1 FORBES Algorithm
Open Source Code	No	The paper does not explicitly state that the source code for their methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We adopt the MNIST Sequence Dataset (D. De Jong, 2016) that consists of sequences of handwriting MNIST digit stokes. and visual-motor control tasks from the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits	No	The paper mentions training and testing on datasets but does not provide specific train/validation/test dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments, only general mentions of neural networks.
Software Dependencies	No	The paper mentions specific algorithms and optimizers used (e.g., GRU, Adam) but does not list specific software libraries or packages with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Hyperparameters For DMControl tasks, we pre-process images by reducing the bit depth to 5 bits and draw batches of 50 sequences of length 50 to train the FORBES model, value model, and action model models using Adam (Kingma & Ba, 2014) with learning rates α0 = 5 10 4, α1 = 8 10 5, α2 = 8 10 5, respectively and scale down gradient norms that exceed 100. We clip the KL regularizers in JModel below 3.0 free nats as in Dreamer and Pla Net. The imagination horizon is H = 15 and the same trajectories are used to update both action and value models. We compute the TD-λ targets with γ = 0.99 and λ = 0.95. As for multiple imagined trajectories, we choose N = 4 across all environments.