Decomposed Prompt Decision Transformer for Efficient Unseen Task Generalization
Authors: Hongling Zheng, Li Shen, Yong Luo, Tongliang Liu, Jialie Shen, Dacheng Tao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on a series of Meta-RL benchmarks demonstrates the superiority of our approach. The project is available at https://github.com/ruthless-man/DPDT. ... In this section, we present an extensive evaluation of our proposed DPDT using widely recognized benchmarks. Additionally, we conduct empirical ablation studies to dissect and understand the individual contributions of the core components of our methodology. |
| Researcher Affiliation | Academia | Hongling Zheng1 Li Shen2 Yong Luo1 Tongliang Liu3 Jialie Shen4 Dacheng Tao5 1Wuhan University 2Shenzhen Campus of Sun Yat-sen University 3The University of Sydney 4City, University of London 5Nanyang Technological University |
| Pseudocode | Yes | Algorithm 1 Decomposed Prompt Tuning Input: Training task set S, Offline datasets DM, Batch size M, Learning rate α, training iterations N, teacher task prompts pteacher k . Initialize: Initialize a 12-layer, 12-head DPDT M using GPT2-SMALL, randomly initialize cross-task prompts Pc and low-rank vectors vk, uk. for t = 1 to N do for k in S do Select a trajectory τ that contains M samples in task k. Calculate P k by Equation 3. Calculate LMSE and Ldis according to Equations 4 and 5. Computed loss function by Equation 6. θ θ α θLT otal. end for end for ... Algorithm 2 Test Time Adaptation Input: Test samples set X, Cross-task prompts Pc, µl(D), σ2 l (D), The number of layers L. 1: for l = 1 to L do 2: for i in X do 3: Calculate Hl,i obtained by inputting the concatenation of Pc and i into DPDT. 4: end for 5: end for 6: for l = 1 to L do 7: Compute µl(T ) and σ2 l (T ) by Equation 7. 8: end for 9: Compute token distribution alignment loss by Equation 8. 10: Optimize Lalign to update Pc. |
| Open Source Code | Yes | The project is available at https://github.com/ruthless-man/DPDT. |
| Open Datasets | Yes | To ensure a fair comparison with existing multi-task offline reinforcement learning algorithms, we conducted verification of DPDT using the Mu Jo Co [43] and Meta World [30] benchmarks, which serve as standard tasks in the domain of sequence offline RL... A detailed description of the datasets and the division of training and test tasks is provided in Appendix A. |
| Dataset Splits | No | While the paper provides detailed training and testing task divisions in Appendix A (Table 13), it does not explicitly define a separate 'validation' dataset split for model training or hyperparameter tuning. |
| Hardware Specification | Yes | All experiments were carried out on a server with 8 NVIDIA 3090 GPUs, each with 24GB of memory, using Py Torch [46] and Hugging Face Transformers libraries [47]. |
| Software Dependencies | No | The paper mentions the use of 'Py Torch [46] and Hugging Face Transformers libraries [47]' but does not specify their version numbers. |
| Experiment Setup | Yes | The experimental hyperparameter configurations are shown in Appendix B. The computer resources utilized by all methods are shown in Table 12. ... Table 8: Common Hyperparameters configuration of DPDT and DPDT-WP. ... Table 9: Common Hyperparameters configuration of MT-BC,MT-DT, Soft-prompt, HDT and Prompt DT. |