HarmonyDream: Task Harmonization Inside World Models

Authors: Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the base MBRL method equipped with Harmony Dream gains 10% 69% absolute performance boosts on visual robotic tasks and sets a new state-of-the-art result on the Atari 100K benchmark.
Researcher Affiliation Collaboration 1School of Software, BNRist, Tsinghua University. 2Huawei Noah s Ark Lab. 3College of Intelligence and Computing, Tianjin University.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks that are explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code is available at https: //github.com/thuml/Harmony Dream.
Open Datasets Yes We consider the tasks of pulling a lever up, pulling a handle up sideways, and hammering a screw on the wall, from the Meta-world domain (Yu et al., 2020b)... We conduct experiments on two relatively easy tasks (Push Button and Reach Target) with dense rewards... DMC Remastered (Grigsby & Qi, 2020)... Atari 100K benchmark (Kaiser et al., 2020)... challenging task from Minecraft (Fan et al., 2022)...
Dataset Splits Yes This dataset is subsequently divided into a training set and a validation set at a ratio of 90% to 10%.
Hardware Specification No The paper mentions general GPU memory requirements ('10GB GPU memory', '5GB GPU memory', 'typical 12GB GPUs') but does not provide specific hardware details such as exact GPU/CPU models or processor types.
Software Dependencies No The paper states 'We implement our Harmony Dream based on Dreamer V2 using Py Torch (Paszke et al., 2019)' and mentions 'automatic mixed precision (Micikevicius et al., 2018)'. While software is named, specific version numbers are not provided, only citation years.
Experiment Setup Yes Important hyperparameters for Harmony Dream are listed in Table 2. This includes 'Observation size 64 64 3', 'Action Repeat 1 for Meta-world 2 for RLBench, DMCR and Natural Background DMC', 'Max episode length 500', 'Training frequency Every 5 environment steps', 'Imagination horizon H 15', 'Discount γ 0.99', 'Batch size 50 for Meta-world and RLBench 16 for DMCR and Natural Background DMC', 'World model learning rate 3 10 4', 'Actor learning rate 8 10 5', and 'Critic learning rate 8 10 5'.