Prompting Decision Transformer for Few-Shot Policy Generalization
Authors: Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua Tenenbaum, Chuang Gan
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments in five Mu Jo Co control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2University of Montreal, Mila 3MIT-IBM Watson AI Lab 4Massachusetts Institute of Technology 5UMass Amherst. |
| Pseudocode | Yes | Algorithm 1 Prompt-DT Training, Algorithm 2 Trajectory Prompt Generation (Get Prompt), Algorithm 3 Prompt-DT Few-Shot Evaluation |
| Open Source Code | No | Project page: https://mxu34.github.io/Prompt DT/. No explicit statement about releasing source code or direct link to a code repository. |
| Open Datasets | Yes | We evaluate in five meta-RL control tasks described as follows (Finn et al., 2017a; Rothfuss et al., 2018; Mitchell et al., 2021; Yu et al., 2020a; Todorov et al., 2012). ... In Meta-World reach-v2 (Yu et al., 2020a) and Dial (Shiarlis et al., 2018), we collect expert trajectories with script expert policies provided in both environments. |
| Dataset Splits | No | The paper mentions training and testing sets/tasks but does not explicitly describe a validation set or its split. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory) are mentioned for running experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) are provided. |
| Experiment Setup | Yes | Appendix: Prompting Decision Transformer for Few-Shot Policy Generalization ... A. Hyperparameters ... Table 2. Common Hyperparameters of Prompt-DT, Prompt-MT-BC, MT-ORL and MT-BC-Finetune ... K (length of context τ) 20 training batch size for each task 8 number of evaluation episodes for each task 20 learning rate 1e-4 learning rate decay weight 1e-4 number of layers 3 number of attention heads 1 embedding dimension 128 activation ReLU |