TD-MPC2: Scalable, Robust World Models for Continuous Control

Authors: Nicklas Hansen, Hao Su, Xiaolong Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TD-MPC2 across a total of 104 diverse continuous control tasks spanning 4 task domains: DMControl (Tassa et al., 2018), Meta-World (Yu et al., 2019), Mani Skill2 (Gu et al., 2023), and Myo Suite (Caggiano et al., 2022). We summarize our results in Figure 1, and visualize task domains in Figure 2. ... Our results demonstrate that TD-MPC2 consistently outperforms existing model-based and model-free methods, using the same hyperparameters across all tasks (Figure 1, right).
Researcher Affiliation Academia University of California San Diego, Equal advising {nihansen,haosu,xiw012}@ucsd.edu
Pseudocode No The paper describes the algorithm in prose and provides architectural details, but it does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes In support of open-source science, we publicly release 300+ model checkpoints, datasets, and code for training and evaluating TD-MPC2 agents, which is available at https://tdmpc2.com.
Open Datasets Yes We evaluate TD-MPC2 across a total of 104 diverse continuous control tasks spanning 4 task domains: DMControl (Tassa et al., 2018), Meta-World (Yu et al., 2019), Mani Skill2 (Gu et al., 2023), and Myo Suite (Caggiano et al., 2022).
Dataset Splits No The paper does not explicitly describe train/validation/test dataset splits with specific percentages or sample counts for the environments or collected data. It mentions 'validation' in the context of Q-function ensemble, but not for overall dataset partitioning.
Hardware Specification Yes Approximate TD-MPC2 training cost on the 80-task dataset, reported in GPU days on a single NVIDIA GeForce RTX 3090 GPU.
Software Dependencies No The paper mentions 'Py Torch-like notation' for the architecture and refers to libraries like 'Layer Norm (Ba et al., 2016)' and 'Mish (Misra, 2019)', but it does not specify exact version numbers for any software dependencies.
Experiment Setup Yes We use the same hyperparameters across all tasks. Our hyperparameters are listed in Table 8. (Table 8 details Planning Horizon, Iterations, Batch size, Learning rate, etc.)