HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
Authors: Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on a series of benchmarks demonstrate the superiority of Harmo DT, verifying the effectiveness of our approach. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University, China 2Shanghai AI Laboratory, China 3Sun Yat-sen University, China 4JD Explore Adademy, China 5Nanyang Technological University, Singapore. |
| Pseudocode | Yes | Algorithm 1 Harmo DT |
| Open Source Code | Yes | Our code is available at: https://github.com/charleshsc/HarmoDT |
| Open Datasets | Yes | Our experiments utilize the Meta-World benchmark (Yu et al., 2020b), featuring 50 distinct manipulation tasks with shared dynamics, requiring a Sawyer robot to interact with various objects. |
| Dataset Splits | No | The paper describes dataset compositions (near-optimal, sub-optimal) and distinguishes between training and testing tasks (seen vs. unseen), but it does not specify explicit train/validation/test splits of a dataset for model training and hyperparameter tuning in the common sense (e.g., 80/10/10% splits). |
| Hardware Specification | Yes | We use NVIDIA GeForce RTX 3090 to train each model. |
| Software Dependencies | No | The paper states: "We build our policy as a Transformer-based model, which is based on minGPT open-source code." While it mentions a specific open-source project, it does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | The configuration for each training iteration is meticulously set, with a batch size of 8 and the utilization of the Adam optimizer, operating at a learning rate of 1e-4. The total number of training steps is established at 10 million. We build our policy as a Transformer-based model, which is based on min GPT open-source code. The specific model parameters and hyper-parameters utilized in our training process are outlined in Table 6. |