reproducibilityindex.ai

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

Authors: Zijun Wu, Yongkang Wu, Lili Mou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results confirm the effectiveness of our method, showing that task semantics in continuous prompts can be generalized across various language models. Moreover, we find that combining task semantics from multiple source models can further enhance the performance of transfer.1
Researcher Affiliation	Collaboration	Zijun Wu1, Yongkang Wu2, Lili Mou1,3 1Dept. Computing Science & Alberta Machine Intelligence Institute (Amii), University of Alberta 2Huawei Poisson Lab 3Canada CIFAR AI Chair
Pseudocode	No	The paper describes its methods using prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1Our code is available at https://github.com/MANGA-UOFA/PTfer
Open Datasets	Yes	We utilized a widely used factual probing dataset, LAMA (Petroni et al., 2019), to evaluate the effectiveness of our continuous prompt transfer approach. We followed recent factual probing studies (Shin et al., 2020; Zhong et al., 2021) that focus on the TREx split of LAMA.
Dataset Splits	No	The paper mentions running 'validation on target models' and choosing projectors with 'highest validation performance', but does not provide specific details on the train/validation/test splits, such as percentages or sample counts for the LAMA dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup	Yes	In the main experiments, we set the default number of prompt embeddings m to 5, and the number of anchors k to 8192. We trained the neural network with 10 epochs using the Adam optimizer (Kingma & Ba, 2014). The learning rate was 5e-3 and the batch size was 16. The hidden dimension of this two-layer neural network was 768.