Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Authors: Dane Corneil, Wulfram Gerstner, Johanni Brea
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the Va ST agent on a series of navigation tasks implemented in the Viz Doom environment... We compared the performance of Va ST against two recently published sample efficient model free approaches: Neural Episodic Control (NEC) (Pritzel et al., 2017) and Prioritized Double DQN (Schaul et al., 2015). |
| Researcher Affiliation | Academia | Dane Corneil 1 Wulfram Gerstner 1 Johanni Brea 1 Laboratory of Computational Neuroscience (LCN), School of Computer and Communication Sciences and Brain Mind Institute, School of Life Sciences, Ecole Polytechnique F ed erale de Lausanne, Switzerland. |
| Pseudocode | Yes | The pseudocode of Va ST, and of our implementation of prioritized sweeping, are in the Supplementary Material. |
| Open Source Code | Yes | The full code for Va ST can be found at https://github. com/danecor/Va ST/. |
| Open Datasets | Yes | We evaluated the Va ST agent on a series of navigation tasks implemented in the Viz Doom environment (see Figure 3A, Kempka et al. (2016)). |
| Dataset Splits | No | The paper describes evaluation over "test epochs" but does not provide specific training/validation/test dataset splits with percentages or sample counts, which is typical for static datasets rather than online reinforcement learning environments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions software environments like "Viz Doom environment" and "Arcade Learning Environment" but does not provide specific version numbers for these or any other ancillary software components or libraries. |
| Experiment Setup | Yes | We use a multilayer perceptron (3 layers for each possible action) for the transition model pθT... and ...temperatures taken from those suggested in (Maddison et al., 2016): λ1 = 2/3 for the posterior distribution and λ2 = 0.5 for evaluating the transition log probabilities. We used two replay memory sizes (N = 100 000 and N = 500 000). |