Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hyper-Decision Transformer for Efficient Online Policy Adaptation
Authors: Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate HDT s generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin. |
| Researcher Affiliation | Collaboration | Mengdi Xu1, Yuchen Lu2, Yikang Shen3, Shun Zhang3, Ding Zhao1 & Chuang Gan3,4 1 Carnegie Mellon University, 2 University of Montreal, Mila, 3 MIT-IBM Watson AI Lab, 4 UMass Amherst |
| Pseudocode | Yes | Algorithm 1 Hyper-network Training; Algorithm 2 Efficient Policy Adaptation without Expert Actions (meta-Lf O); Algorithm 3 DT pre-training; Algorithm 4 Efficient Policy Adaptation with expert actions (meta-IL). |
| Open Source Code | No | Demos are available on our project page.1 Project Page: https://sites.google.com/view/hdtforiclr2023/home. (Checked the project page: 'Code available soon!') |
| Open Datasets | Yes | We conduct extensive experiments in the Meta-World benchmark Yu et al. (2020), which contains diverse manipulation tasks requiring fine-grind gripper control. We train HDT with 45 tasks and test its generalization capability in 5 testing tasks with unseen objects, or seen objects with different reward functions. ... We also add another set of locomotion tasks based on D4RL s pointmaze environment Fu et al. (2020). |
| Dataset Splits | No | The paper specifies training and testing tasks and data collection for each, but does not explicitly mention a separate validation set or provide specific splits (e.g., percentages or counts) for such a set within the datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions 'd3rlpy package Takuma Seno (2021)' but does not specify its version number or other key software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | Table 5: Hyperparameters for DT-related models; K (length of context τ) 20; demonstration length 200; training batch size for each task M 16; pre-training iterations 4000; learning rate αθ, αϕ, αψ, 1e-4; online rollout budget Nepi 4000; fine-tuning iterations 200; exploration ϵ 0.2. |