DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Authors: Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical findings are substantiated by the experimental result that a trained Deep MDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a Deep MDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL. 7. Empirical Evaluation
Researcher Affiliation Collaboration 1Google Brain 2Center for Language and Speech Processing, Johns Hopkins University. Correspondence to: Carles Gelada <cgel@google.com>.
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code for replicating all experiments is included in the supplementary material.
Open Datasets Yes Atari 2600 domain. Arcade Learning Environment (Bellemare et al., 2013a)
Dataset Splits No The paper uses standard environments like Atari 2600 but does not provide specific details on how the datasets within these environments were split into training, validation, and testing sets (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software frameworks like 'Tensorflow' and 'Dopamine (Castro et al., 2018)' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup Yes For experimental details, see Appendix C. The agent is trained for 200M frames with a batch size of 32. The Adam optimizer is used with an initial learning rate of 6.25e-5, which is then linearly decayed to 0 over the course of training.