reproducibilityindex.ai

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Authors: Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Mu Jo Co and Meta-World benchmarks across various dataset types show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines while being more practical with fewer prerequisites.
Researcher Affiliation	Academia	1Nanjing University 2Institution of Automation, Chinese Academy of Sciences {zhiwang, clchen}@nju.edu.cn {lizhang, whao_wu}@smail.nju.edu.cn {yuanheng.zhu, dongbin.zhao}@ia.ac.cn
Pseudocode	Yes	Appendix A. Algorithm Pesudocodes Based on the implementations in Sec. 4, this section gives the brief procedures of our method. First, Algorithm 1 presents the pretraining of the context-aware world model. Then, Algorithm 2 shows the pipeline of training Meta-DT, where the sub-procedure of generating the complementary prompt is given in Algorithm 3. Finally, Algorithm 4 and Algorithm 5 show the few-shot and zero-shot evaluations on test tasks, respectively.
Open Source Code	Yes	Our code is available at https://github.com/NJU-RL/Meta-DT.
Open Datasets	Yes	We evaluate all tested methods on three classical benchmarks in meta-RL: i) the 2D navigation environment Point-Robot [25]; ii) the multi-task Mu Jo Co control [55, 36], containing Cheetah-Vel, Cheetah-Dir, Ant-Dir, Hopper-Param, and Walker-Param; and iii) the Meta World manipulation platform [56], including Reach, Sweep, and Door-Lock.
Dataset Splits	No	For each environment, we randomly sample a distribution of tasks and divide them into the training set T train and test set T test. ... For the Point-Robot and Mu Jo Co environments, we sample 45 tasks for training and another 5 held-out tasks for testing. For Meta-World environments, we sample 15 tasks for training and 5 held-out tasks for testing. (No explicit validation set mentioned for their Meta-DT training, only train/test.)
Hardware Specification	Yes	We train our models on one Nvidia RTX4080 GPU with the Intel Core i9-10900X CPU and 256G RAM.
Software Dependencies	No	The paper mentions implementing Meta-DT based on the official DT codebase and notes optimizer (Adam) and other parameters, but it does not specify versions for key software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	Some common hyperparameters across all report units are set as: optimizer Adam, weight decay 1e-4, linear warmup steps for learning rate decay 10000, gradient norm clip 0.25, dropout 0.1, and batch size 128. Table 7 presents the detailed hyperparameters of Meta-DT trained on the Point-Robot and Mu Jo Co domains with the Medium, Expert, and Mixed datasets. Table 8 presents the detailed hyperparameters of Meta-DT trained on Meta-World environments with the Medium datasets.