Prompting Decision Transformer for Few-Shot Policy Generalization

Authors: Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua Tenenbaum, Chuang Gan

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in five Mu Jo Co control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2University of Montreal, Mila 3MIT-IBM Watson AI Lab 4Massachusetts Institute of Technology 5UMass Amherst.
Pseudocode Yes Algorithm 1 Prompt-DT Training, Algorithm 2 Trajectory Prompt Generation (Get Prompt), Algorithm 3 Prompt-DT Few-Shot Evaluation
Open Source Code No Project page: https://mxu34.github.io/Prompt DT/. No explicit statement about releasing source code or direct link to a code repository.
Open Datasets Yes We evaluate in five meta-RL control tasks described as follows (Finn et al., 2017a; Rothfuss et al., 2018; Mitchell et al., 2021; Yu et al., 2020a; Todorov et al., 2012). ... In Meta-World reach-v2 (Yu et al., 2020a) and Dial (Shiarlis et al., 2018), we collect expert trajectories with script expert policies provided in both environments.
Dataset Splits No The paper mentions training and testing sets/tasks but does not explicitly describe a validation set or its split.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) are mentioned for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) are provided.
Experiment Setup Yes Appendix: Prompting Decision Transformer for Few-Shot Policy Generalization ... A. Hyperparameters ... Table 2. Common Hyperparameters of Prompt-DT, Prompt-MT-BC, MT-ORL and MT-BC-Finetune ... K (length of context τ) 20 training batch size for each task 8 number of evaluation episodes for each task 20 learning rate 1e-4 learning rate decay weight 1e-4 number of layers 3 number of attention heads 1 embedding dimension 128 activation ReLU