Deep Variational Reinforcement Learning for POMDPs
Authors: Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DVRL on Mountain Hike and on flickering Atari. We show that DVRL deals better with noisy or partially occluded observations and that this scales to high dimensional and continuous observation spaces like images and complex tasks. We also perform a series of ablation studies, showing the importance of using many particles, including the ELBO training objective in the loss function, and jointly optimising the ELBO and RL losses. |
| Researcher Affiliation | Academia | 1University of Oxford, United Kingdom 2University of British Columbia, Canada. |
| Pseudocode | No | The paper describes the DVRL algorithm conceptually and with equations, but does not present it in a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any statements about making the source code available or provide links to a code repository. |
| Open Datasets | Yes | We evaluate DVRL on Mountain Hike and on flickering Atari. Atari environments (Bellemare et al., 2013) provide a wide set of challenging tasks with high dimensional observation spaces. |
| Dataset Splits | No | The paper describes generating trajectories through interaction in reinforcement learning environments ("ne parallel environments") but does not specify explicit train/validation/test dataset splits like those used in supervised learning. |
| Hardware Specification | Yes | The NVIDIA DGX-1 used for this research was donated by the NVIDIA corporation. |
| Software Dependencies | No | The paper mentions PyTorch and GRUs but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | DVRL used 30 particles and we set ng = 25 for both RNN and DVRL. The latent state h for the RNN-encoder architecture was of dimension 256 and 128 for both z and h for DVRL. Lastly, λE = 1 and ns = 5 were used, together with RMSProp with a learning rate of 10-4 for both approaches. |