Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Universality and Limitations of Prompt Tuning
Authors: Yihan Wang, Jatin Chauhan, Wei Wang, Cho-Jui Hsieh
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical claims are also corroborated by empirical results. We conduct empirical studies, including real-world datasets, to verify our theoretical claims. |
| Researcher Affiliation | Collaboration | Yihan Wang UCLA EMAIL Jatin Chauhan UCLA EMAIL Wei Wang UCLA EMAIL Cho-Jui Hsieh Google and UCLA EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using a third-party library ('Huggingface Peft library') but does not state that the code for their own methodology is open-source or provide a link. |
| Open Datasets | Yes | On the dataset front, we utilize the RTE subtask of Super Glue dataset [Wang et al., 2019] and WMT14 En-Fr translation [Bojar et al., 2014]. |
| Dataset Splits | No | While the paper's figures show 'Validation Loss', it does not provide specific dataset split information (percentages, sample counts, or explicit methodology) for training, validation, or test sets. |
| Hardware Specification | Yes | All the experiments are run on a NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions 'Huggingface Peft library' and 'Adam optimizer' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For experiments with Llama 7B model, we use batch size 32 and learning rate 0.001. For experiment on WMT14 En-Fr translation, we only compute the loss on the first 100 examples for computational efficiency. We use Adam optimizer and optimal learning rate from grid search at 0.1 for prompt-tuning and at 0.001 for fine-tuning in Section 7.2. In Section 7.3, we use the default loss function in Huggingface implementation for causal language models. We use prompt length m = 10 and the prompt tokens are initialized as the first m tokens in the model vocabulary. |