DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
Authors: Fei Deng, Ingook Jang, Sungjin Ahn
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Deep Mind Control suite show that Dreamer Pro achieves better overall performance than state-of-the-art contrastive MBRL agents when there are complex background distractions, and maintains similar performance as Dreamer in standard tasks where contrastive MBRL agents can perform much worse. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Rutgers University 2ETRI 3School of Computing, KAIST. |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulations, but no pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implement Dreamer Pro1 and Dreaming based on a newer version of Dreamer2, while the official implementation of TPC3 is based on an older version. (Footnote 1: https://github.com/fdeng18/dreamer-pro) |
| Open Datasets | Yes | We evaluate our model and the baselines on six image-based continuous control tasks from the Deep Mind Control (DMC) suite (Tassa et al., 2018). ... (Tassa et al., 2018) is cited as: Deep Mind Control Suite. arXiv preprint arXiv:1801.00690, 2018. ... where the background is replaced by task-irrelevant natural videos randomly sampled from the driving car class in the Kinetics 400 dataset (Kay et al., 2017). ... (Kay et al., 2017) is cited as: The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017. |
| Dataset Splits | No | The paper mentions 'training' and 'evaluation' sets for background videos, containing '683 and 69 videos respectively', which constitutes a train/test split. However, it does not explicitly mention a separate 'validation' set or its split details for either the DMC tasks or the background videos. |
| Hardware Specification | Yes | In Table 6 below, we record during training the number of frames processed per second (FPS) by Dreamer and Dreamer Pro on NVIDIA Quadro RTX 8000 GPUs. |
| Software Dependencies | No | The paper references specific versions of Dreamer's implementation (e.g., 'DreamerV2' with a specific commit hash) but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We adopt the default values for the Dreamer hyperparameters, except that we use continuous latents and tanh normal as the distribution output by the actor. ... Following TPC, we increase the weight of the reward loss J t R to 1000 for all models in the natural background setting... We use the default batch size of 50 for Dreamer, Dreaming, and Dreamer Pro. ... Table 3: Additional hyperparameters in Dreamer Pro. Number of prototypes K 2500, Prototype dimension 32, Softmax temperature τ 0.1, Sinkhorn iterations 3, Sinkhorn epsilon 0.0125, Momentum update fraction η 0.05. |