Flow-based Recurrent Belief State Learning for POMDPs
Authors: Xiaoyu Chen, Yao Mark Mu, Ping Luo, Shengbo Li, Jianyu Chen
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that our methods successfully capture the complex belief states that enable multi-modal predictions as well as high quality reconstructions, and results on challenging visual-motor control tasks show that our method achieves superior performance and sample efficiency. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Computer Science, The University of Hong Kong 3School of Vehicle and Mobility, Tsinghua University 4Shanghai Qizhi Institute. |
| Pseudocode | Yes | Algorithm 1 FORBES Algorithm |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We adopt the MNIST Sequence Dataset (D. De Jong, 2016) that consists of sequences of handwriting MNIST digit stokes. and visual-motor control tasks from the Deep Mind Control Suite (Tassa et al., 2018). |
| Dataset Splits | No | The paper mentions training and testing on datasets but does not provide specific train/validation/test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments, only general mentions of neural networks. |
| Software Dependencies | No | The paper mentions specific algorithms and optimizers used (e.g., GRU, Adam) but does not list specific software libraries or packages with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Hyperparameters For DMControl tasks, we pre-process images by reducing the bit depth to 5 bits and draw batches of 50 sequences of length 50 to train the FORBES model, value model, and action model models using Adam (Kingma & Ba, 2014) with learning rates α0 = 5 10 4, α1 = 8 10 5, α2 = 8 10 5, respectively and scale down gradient norms that exceed 100. We clip the KL regularizers in JModel below 3.0 free nats as in Dreamer and Pla Net. The imagination horizon is H = 15 and the same trajectories are used to update both action and value models. We compute the TD-λ targets with γ = 0.99 and λ = 0.95. As for multiple imagined trajectories, we choose N = 4 across all environments. |