reproducibilityindex.ai

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Authors: Fei Deng, Ingook Jang, Sungjin Ahn

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Deep Mind Control suite show that Dreamer Pro achieves better overall performance than state-of-the-art contrastive MBRL agents when there are complex background distractions, and maintains similar performance as Dreamer in standard tasks where contrastive MBRL agents can perform much worse.
Researcher Affiliation	Academia	1Department of Computer Science, Rutgers University 2ETRI 3School of Computing, KAIST.
Pseudocode	No	The paper provides architectural diagrams and mathematical formulations, but no pseudocode or algorithm blocks.
Open Source Code	Yes	We implement Dreamer Pro1 and Dreaming based on a newer version of Dreamer2, while the official implementation of TPC3 is based on an older version. (Footnote 1: https://github.com/fdeng18/dreamer-pro)
Open Datasets	Yes	We evaluate our model and the baselines on six image-based continuous control tasks from the Deep Mind Control (DMC) suite (Tassa et al., 2018). ... (Tassa et al., 2018) is cited as: Deep Mind Control Suite. arXiv preprint arXiv:1801.00690, 2018. ... where the background is replaced by task-irrelevant natural videos randomly sampled from the driving car class in the Kinetics 400 dataset (Kay et al., 2017). ... (Kay et al., 2017) is cited as: The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
Dataset Splits	No	The paper mentions 'training' and 'evaluation' sets for background videos, containing '683 and 69 videos respectively', which constitutes a train/test split. However, it does not explicitly mention a separate 'validation' set or its split details for either the DMC tasks or the background videos.
Hardware Specification	Yes	In Table 6 below, we record during training the number of frames processed per second (FPS) by Dreamer and Dreamer Pro on NVIDIA Quadro RTX 8000 GPUs.
Software Dependencies	No	The paper references specific versions of Dreamer's implementation (e.g., 'DreamerV2' with a specific commit hash) but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We adopt the default values for the Dreamer hyperparameters, except that we use continuous latents and tanh normal as the distribution output by the actor. ... Following TPC, we increase the weight of the reward loss J t R to 1000 for all models in the natural background setting... We use the default batch size of 50 for Dreamer, Dreaming, and Dreamer Pro. ... Table 3: Additional hyperparameters in Dreamer Pro. Number of prototypes K 2500, Prototype dimension 32, Softmax temperature τ 0.1, Sinkhorn iterations 3, Sinkhorn epsilon 0.0125, Momentum update fraction η 0.05.