Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming

Authors: Hany Hamed, Subin Kim, Dongyeong Kim, Jaesik Yoon, Sungjin Ahn

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that the proposed model outperforms prior pixel-based MBRL methods in various visually complex and partially observable navigation tasks.
Researcher Affiliation Collaboration 1KAIST 2SAP. Correspondence to: Sungjin Ahn <sjn.ahn@gmail.com>.
Pseudocode Yes Algorithm 1 Dr. Strategy Initialize: World Model M, Replay buffer D, landmark auto-encoder (encϕ(s), {l1, ...., l N}, decϕ(l)), Highway policy πl(at|st, l), Explorer πe(at|st), Achiever πg(at|st, g)
Open Source Code No The paper does not include an explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets Yes To empirically investigate the proposed agent, we evaluate it in two types of navigation environments and a robot manipulation environment. One type of navigation environment is 2D navigation... We have designed a 3D-Maze navigation... Additionally, our evaluation extends to a robot manipulation environment, the Robo Kitchen benchmark introduced in a prior work (Mendonca et al., 2021).
Dataset Splits No The paper mentions 'zero-shot evaluation' where goals are unseen during training and are user-defined at test time, which defines a test set. However, it does not explicitly describe a separate validation set or specific validation split percentages/counts. It states: 'For the evaluations, we trained all baselines for 3 seeds per environment', which refers to evaluation runs, not a validation set.
Hardware Specification Yes The training of our agent took 2 to 6 days based on the environment using 24GB VRAM GPU.
Software Dependencies No The paper mentions using 'Dreamer V2 (Hafner et al., 2020)', 'Adam optimizer (Kingma & Ba, 2014)', and 'VQ-VAE', but does not provide specific version numbers for these or other software libraries/frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes Appendix E. Hyperparameters Table 4. We made minor changes only in a few hyper-parameters such as the learning rates of world model, actor, and critic by following the hyperparameters of Choreographer (Mazzaglia et al., 2022b) as it is also utilizing VQ-VAE like our method. ... Batch size B 50 Trajectory length TS 50 Discrete latent dimensions 32 Discrete latent classes 32 ... Learning rate 3e-4 Imagination Horizon H 15 Discount 0.99 Lambda-target parameter 0.95 Actor learning rate 8e-5 Critic learning rate 8e-5 ...