Hyper-Decision Transformer for Efficient Online Policy Adaptation

Authors: Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate HDT s generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.
Researcher Affiliation Collaboration Mengdi Xu1, Yuchen Lu2, Yikang Shen3, Shun Zhang3, Ding Zhao1 & Chuang Gan3,4 1 Carnegie Mellon University, 2 University of Montreal, Mila, 3 MIT-IBM Watson AI Lab, 4 UMass Amherst
Pseudocode Yes Algorithm 1 Hyper-network Training; Algorithm 2 Efficient Policy Adaptation without Expert Actions (meta-Lf O); Algorithm 3 DT pre-training; Algorithm 4 Efficient Policy Adaptation with expert actions (meta-IL).
Open Source Code No Demos are available on our project page.1 Project Page: https://sites.google.com/view/hdtforiclr2023/home. (Checked the project page: 'Code available soon!')
Open Datasets Yes We conduct extensive experiments in the Meta-World benchmark Yu et al. (2020), which contains diverse manipulation tasks requiring fine-grind gripper control. We train HDT with 45 tasks and test its generalization capability in 5 testing tasks with unseen objects, or seen objects with different reward functions. ... We also add another set of locomotion tasks based on D4RL s pointmaze environment Fu et al. (2020).
Dataset Splits No The paper specifies training and testing tasks and data collection for each, but does not explicitly mention a separate validation set or provide specific splits (e.g., percentages or counts) for such a set within the datasets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions 'd3rlpy package Takuma Seno (2021)' but does not specify its version number or other key software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup Yes Table 5: Hyperparameters for DT-related models; K (length of context τ) 20; demonstration length 200; training batch size for each task M 16; pre-training iterations 4000; learning rate αθ, αϕ, αψ, 1e-4; online rollout budget Nepi 4000; fine-tuning iterations 200; exploration ϵ 0.2.