TD-MPC2: Scalable, Robust World Models for Continuous Control
Authors: Nicklas Hansen, Hao Su, Xiaolong Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TD-MPC2 across a total of 104 diverse continuous control tasks spanning 4 task domains: DMControl (Tassa et al., 2018), Meta-World (Yu et al., 2019), Mani Skill2 (Gu et al., 2023), and Myo Suite (Caggiano et al., 2022). We summarize our results in Figure 1, and visualize task domains in Figure 2. ... Our results demonstrate that TD-MPC2 consistently outperforms existing model-based and model-free methods, using the same hyperparameters across all tasks (Figure 1, right). |
| Researcher Affiliation | Academia | University of California San Diego, Equal advising {nihansen,haosu,xiw012}@ucsd.edu |
| Pseudocode | No | The paper describes the algorithm in prose and provides architectural details, but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | In support of open-source science, we publicly release 300+ model checkpoints, datasets, and code for training and evaluating TD-MPC2 agents, which is available at https://tdmpc2.com. |
| Open Datasets | Yes | We evaluate TD-MPC2 across a total of 104 diverse continuous control tasks spanning 4 task domains: DMControl (Tassa et al., 2018), Meta-World (Yu et al., 2019), Mani Skill2 (Gu et al., 2023), and Myo Suite (Caggiano et al., 2022). |
| Dataset Splits | No | The paper does not explicitly describe train/validation/test dataset splits with specific percentages or sample counts for the environments or collected data. It mentions 'validation' in the context of Q-function ensemble, but not for overall dataset partitioning. |
| Hardware Specification | Yes | Approximate TD-MPC2 training cost on the 80-task dataset, reported in GPU days on a single NVIDIA GeForce RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch-like notation' for the architecture and refers to libraries like 'Layer Norm (Ba et al., 2016)' and 'Mish (Misra, 2019)', but it does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | We use the same hyperparameters across all tasks. Our hyperparameters are listed in Table 8. (Table 8 details Planning Horizon, Iterations, Batch size, Learning rate, etc.) |