reproducibilityindex.ai

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Authors: Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical ﬁndings are substantiated by the experimental result that a trained Deep MDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a Deep MDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL. 7. Empirical Evaluation
Researcher Affiliation	Collaboration	1Google Brain 2Center for Language and Speech Processing, Johns Hopkins University. Correspondence to: Carles Gelada <cgel@google.com>.
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code for replicating all experiments is included in the supplementary material.
Open Datasets	Yes	Atari 2600 domain. Arcade Learning Environment (Bellemare et al., 2013a)
Dataset Splits	No	The paper uses standard environments like Atari 2600 but does not provide specific details on how the datasets within these environments were split into training, validation, and testing sets (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions software frameworks like 'Tensorﬂow' and 'Dopamine (Castro et al., 2018)' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup	Yes	For experimental details, see Appendix C. The agent is trained for 200M frames with a batch size of 32. The Adam optimizer is used with an initial learning rate of 6.25e-5, which is then linearly decayed to 0 over the course of training.