Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming
Authors: Hany Hamed, Subin Kim, Dongyeong Kim, Jaesik Yoon, Sungjin Ahn
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that the proposed model outperforms prior pixel-based MBRL methods in various visually complex and partially observable navigation tasks. |
| Researcher Affiliation | Collaboration | 1KAIST 2SAP. Correspondence to: Sungjin Ahn <sjn.ahn@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Dr. Strategy Initialize: World Model M, Replay buffer D, landmark auto-encoder (encϕ(s), {l1, ...., l N}, decϕ(l)), Highway policy πl(at|st, l), Explorer πe(at|st), Achiever πg(at|st, g) |
| Open Source Code | No | The paper does not include an explicit statement about releasing code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | To empirically investigate the proposed agent, we evaluate it in two types of navigation environments and a robot manipulation environment. One type of navigation environment is 2D navigation... We have designed a 3D-Maze navigation... Additionally, our evaluation extends to a robot manipulation environment, the Robo Kitchen benchmark introduced in a prior work (Mendonca et al., 2021). |
| Dataset Splits | No | The paper mentions 'zero-shot evaluation' where goals are unseen during training and are user-defined at test time, which defines a test set. However, it does not explicitly describe a separate validation set or specific validation split percentages/counts. It states: 'For the evaluations, we trained all baselines for 3 seeds per environment', which refers to evaluation runs, not a validation set. |
| Hardware Specification | Yes | The training of our agent took 2 to 6 days based on the environment using 24GB VRAM GPU. |
| Software Dependencies | No | The paper mentions using 'Dreamer V2 (Hafner et al., 2020)', 'Adam optimizer (Kingma & Ba, 2014)', and 'VQ-VAE', but does not provide specific version numbers for these or other software libraries/frameworks (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | Appendix E. Hyperparameters Table 4. We made minor changes only in a few hyper-parameters such as the learning rates of world model, actor, and critic by following the hyperparameters of Choreographer (Mazzaglia et al., 2022b) as it is also utilizing VQ-VAE like our method. ... Batch size B 50 Trajectory length TS 50 Discrete latent dimensions 32 Discrete latent classes 32 ... Learning rate 3e-4 Imagination Horizon H 15 Discount 0.99 Lambda-target parameter 0.95 Actor learning rate 8e-5 Critic learning rate 8e-5 ... |