Universality and Limitations of Prompt Tuning
Authors: Yihan Wang, Jatin Chauhan, Wei Wang, Cho-Jui Hsieh
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical claims are also corroborated by empirical results. We conduct empirical studies, including real-world datasets, to verify our theoretical claims. |
| Researcher Affiliation | Collaboration | Yihan Wang UCLA wangyihan617@gmail.com Jatin Chauhan UCLA chauhanjatin100@gmail.com Wei Wang UCLA weiwang@cs.ucla.edu Cho-Jui Hsieh Google and UCLA chohsieh@cs.ucla.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using a third-party library ('Huggingface Peft library') but does not state that the code for their own methodology is open-source or provide a link. |
| Open Datasets | Yes | On the dataset front, we utilize the RTE subtask of Super Glue dataset [Wang et al., 2019] and WMT14 En-Fr translation [Bojar et al., 2014]. |
| Dataset Splits | No | While the paper's figures show 'Validation Loss', it does not provide specific dataset split information (percentages, sample counts, or explicit methodology) for training, validation, or test sets. |
| Hardware Specification | Yes | All the experiments are run on a NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions 'Huggingface Peft library' and 'Adam optimizer' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For experiments with Llama 7B model, we use batch size 32 and learning rate 0.001. For experiment on WMT14 En-Fr translation, we only compute the loss on the first 100 examples for computational efficiency. We use Adam optimizer and optimal learning rate from grid search at 0.1 for prompt-tuning and at 0.001 for fine-tuning in Section 7.2. In Section 7.3, we use the default loss function in Huggingface implementation for causal language models. We use prompt length m = 10 and the prompt tokens are initialized as the first m tokens in the model vocabulary. |