Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

Authors: Zijun Wu, Yongkang Wu, Lili Mou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results confirm the effectiveness of our method, showing that task semantics in continuous prompts can be generalized across various language models. Moreover, we find that combining task semantics from multiple source models can further enhance the performance of transfer.1
Researcher Affiliation Collaboration Zijun Wu1, Yongkang Wu2, Lili Mou1,3 1Dept. Computing Science & Alberta Machine Intelligence Institute (Amii), University of Alberta 2Huawei Poisson Lab 3Canada CIFAR AI Chair
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes 1Our code is available at https://github.com/MANGA-UOFA/PTfer
Open Datasets Yes We utilized a widely used factual probing dataset, LAMA (Petroni et al., 2019), to evaluate the effectiveness of our continuous prompt transfer approach. We followed recent factual probing studies (Shin et al., 2020; Zhong et al., 2021) that focus on the TREx split of LAMA.
Dataset Splits No The paper mentions running 'validation on target models' and choosing projectors with 'highest validation performance', but does not provide specific details on the train/validation/test splits, such as percentages or sample counts for the LAMA dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models, memory, or cloud computing specifications.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes In the main experiments, we set the default number of prompt embeddings m to 5, and the number of anchors k to 8192. We trained the neural network with 10 epochs using the Adam optimizer (Kingma & Ba, 2014). The learning rate was 5e-3 and the batch size was 16. The hidden dimension of this two-layer neural network was 768.