Hyper-Decision Transformer for Efficient Online Policy Adaptation
Authors: Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate HDT s generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin. |
| Researcher Affiliation | Collaboration | Mengdi Xu1, Yuchen Lu2, Yikang Shen3, Shun Zhang3, Ding Zhao1 & Chuang Gan3,4 1 Carnegie Mellon University, 2 University of Montreal, Mila, 3 MIT-IBM Watson AI Lab, 4 UMass Amherst |
| Pseudocode | Yes | Algorithm 1 Hyper-network Training; Algorithm 2 Efficient Policy Adaptation without Expert Actions (meta-Lf O); Algorithm 3 DT pre-training; Algorithm 4 Efficient Policy Adaptation with expert actions (meta-IL). |
| Open Source Code | No | Demos are available on our project page.1 Project Page: https://sites.google.com/view/hdtforiclr2023/home. (Checked the project page: 'Code available soon!') |
| Open Datasets | Yes | We conduct extensive experiments in the Meta-World benchmark Yu et al. (2020), which contains diverse manipulation tasks requiring fine-grind gripper control. We train HDT with 45 tasks and test its generalization capability in 5 testing tasks with unseen objects, or seen objects with different reward functions. ... We also add another set of locomotion tasks based on D4RL s pointmaze environment Fu et al. (2020). |
| Dataset Splits | No | The paper specifies training and testing tasks and data collection for each, but does not explicitly mention a separate validation set or provide specific splits (e.g., percentages or counts) for such a set within the datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions 'd3rlpy package Takuma Seno (2021)' but does not specify its version number or other key software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | Table 5: Hyperparameters for DT-related models; K (length of context τ) 20; demonstration length 200; training batch size for each task M 16; pre-training iterations 4000; learning rate αθ, αϕ, αψ, 1e-4; online rollout budget Nepi 4000; fine-tuning iterations 200; exploration ϵ 0.2. |