DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Authors: Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical findings are substantiated by the experimental result that a trained Deep MDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a Deep MDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL. 7. Empirical Evaluation |
| Researcher Affiliation | Collaboration | 1Google Brain 2Center for Language and Speech Processing, Johns Hopkins University. Correspondence to: Carles Gelada <cgel@google.com>. |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code for replicating all experiments is included in the supplementary material. |
| Open Datasets | Yes | Atari 2600 domain. Arcade Learning Environment (Bellemare et al., 2013a) |
| Dataset Splits | No | The paper uses standard environments like Atari 2600 but does not provide specific details on how the datasets within these environments were split into training, validation, and testing sets (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software frameworks like 'Tensorflow' and 'Dopamine (Castro et al., 2018)' but does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | For experimental details, see Appendix C. The agent is trained for 200M frames with a batch size of 32. The Adam optimizer is used with an initial learning rate of 6.25e-5, which is then linearly decayed to 0 over the course of training. |