Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prompting Decision Transformer for Few-Shot Policy Generalization

Authors: Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua Tenenbaum, Chuang Gan

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in five Mu Jo Co control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2University of Montreal, Mila 3MIT-IBM Watson AI Lab 4Massachusetts Institute of Technology 5UMass Amherst.
Pseudocode Yes Algorithm 1 Prompt-DT Training, Algorithm 2 Trajectory Prompt Generation (Get Prompt), Algorithm 3 Prompt-DT Few-Shot Evaluation
Open Source Code No Project page: https://mxu34.github.io/Prompt DT/. No explicit statement about releasing source code or direct link to a code repository.
Open Datasets Yes We evaluate in five meta-RL control tasks described as follows (Finn et al., 2017a; Rothfuss et al., 2018; Mitchell et al., 2021; Yu et al., 2020a; Todorov et al., 2012). ... In Meta-World reach-v2 (Yu et al., 2020a) and Dial (Shiarlis et al., 2018), we collect expert trajectories with script expert policies provided in both environments.
Dataset Splits No The paper mentions training and testing sets/tasks but does not explicitly describe a validation set or its split.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) are mentioned for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) are provided.
Experiment Setup Yes Appendix: Prompting Decision Transformer for Few-Shot Policy Generalization ... A. Hyperparameters ... Table 2. Common Hyperparameters of Prompt-DT, Prompt-MT-BC, MT-ORL and MT-BC-Finetune ... K (length of context τ) 20 training batch size for each task 8 number of evaluation episodes for each task 20 learning rate 1e-4 learning rate decay weight 1e-4 number of layers 3 number of attention heads 1 embedding dimension 128 activation ReLU