Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
Authors: Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao
ICML 2024 | Venue PDF | LLM Run Details | Input Tokens: 24,379 Total number of tokens sent to the LLM as input for this paper's analysis. | Output Tokens: 3,303 Total number of tokens produced by the LLM (including reasoning/thinking tokens) for this paper's analysis.
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on a series of benchmarks demonstrate the superiority of Harmo DT, verifying the effectiveness of our approach. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University, China 2Shanghai AI Laboratory, China 3Sun Yat-sen University, China 4JD Explore Adademy, China 5Nanyang Technological University, Singapore. |
| Pseudocode | Yes | Algorithm 1 Harmo DT |
| Open Source Code | Yes | Our code is available at: https://github.com/charleshsc/HarmoDT |
| Open Datasets | Yes | Our experiments utilize the Meta-World benchmark (Yu et al., 2020b), featuring 50 distinct manipulation tasks with shared dynamics, requiring a Sawyer robot to interact with various objects. |
| Dataset Splits | No | The paper describes dataset compositions (near-optimal, sub-optimal) and distinguishes between training and testing tasks (seen vs. unseen), but it does not specify explicit train/validation/test splits of a dataset for model training and hyperparameter tuning in the common sense (e.g., 80/10/10% splits). |
| Hardware Specification | Yes | We use NVIDIA GeForce RTX 3090 to train each model. |
| Software Dependencies | No | The paper states: "We build our policy as a Transformer-based model, which is based on minGPT open-source code." While it mentions a specific open-source project, it does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | The configuration for each training iteration is meticulously set, with a batch size of 8 and the utilization of the Adam optimizer, operating at a learning rate of 1e-4. The total number of training steps is established at 10 million. We build our policy as a Transformer-based model, which is based on min GPT open-source code. The specific model parameters and hyper-parameters utilized in our training process are outlined in Table 6. |