Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Authors: Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani, Chenyang Li, Zhao Song, Han Liu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on single-head transformers with only a single self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-d L and -in-(1/ϵ) lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the soft-prompt-induced keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms. |
| Researcher Affiliation | Academia | Northwestern University National Taiwan University Fuzhou University UC Berkeley |
| Pseudocode | No | The paper discusses theoretical concepts, proofs, and the existence of algorithms but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The text states: 'Full version and future updates are on arXiv.' This indicates where a full version of the paper might be found or future updates, but does not explicitly state that the code for the described methodology is open-source or available. |
| Open Datasets | No | The paper defines abstract input/output sequences (X, Y) and a 'finetuning dataset S = {(X(i), Y (i))}i [N]' in a theoretical context but does not mention any specific publicly available datasets or provide access information for any datasets used for experimental evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments involving datasets, thus there are no dataset splits mentioned. |
| Hardware Specification | No | The paper is theoretical and focuses on statistical and computational limits, not empirical results. Therefore, no specific hardware used for running experiments is mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any practical implementation or experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical analysis, proving universality, memorization capacity, and computational limits. It does not describe an experimental setup with hyperparameters, training configurations, or system-level settings. |