reproducibilityindex.ai

Deep Variational Reinforcement Learning for POMDPs

Authors: Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate DVRL on Mountain Hike and on flickering Atari. We show that DVRL deals better with noisy or partially occluded observations and that this scales to high dimensional and continuous observation spaces like images and complex tasks. We also perform a series of ablation studies, showing the importance of using many particles, including the ELBO training objective in the loss function, and jointly optimising the ELBO and RL losses.
Researcher Affiliation	Academia	1University of Oxford, United Kingdom 2University of British Columbia, Canada.
Pseudocode	No	The paper describes the DVRL algorithm conceptually and with equations, but does not present it in a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any statements about making the source code available or provide links to a code repository.
Open Datasets	Yes	We evaluate DVRL on Mountain Hike and on flickering Atari. Atari environments (Bellemare et al., 2013) provide a wide set of challenging tasks with high dimensional observation spaces.
Dataset Splits	No	The paper describes generating trajectories through interaction in reinforcement learning environments ("ne parallel environments") but does not specify explicit train/validation/test dataset splits like those used in supervised learning.
Hardware Specification	Yes	The NVIDIA DGX-1 used for this research was donated by the NVIDIA corporation.
Software Dependencies	No	The paper mentions PyTorch and GRUs but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	DVRL used 30 particles and we set ng = 25 for both RNN and DVRL. The latent state h for the RNN-encoder architecture was of dimension 256 and 128 for both z and h for DVRL. Lastly, λE = 1 and ns = 5 were used, together with RMSProp with a learning rate of 10-4 for both approaches.