Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Authors: Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani, Chenyang Li, Zhao Song, Han Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on single-head transformers with only a single self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-d L and -in-(1/ϵ) lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the soft-prompt-induced keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms.
Researcher Affiliation Academia Northwestern University National Taiwan University Fuzhou University UC Berkeley
Pseudocode No The paper discusses theoretical concepts, proofs, and the existence of algorithms but does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The text states: 'Full version and future updates are on arXiv.' This indicates where a full version of the paper might be found or future updates, but does not explicitly state that the code for the described methodology is open-source or available.
Open Datasets No The paper defines abstract input/output sequences (X, Y) and a 'finetuning dataset S = {(X(i), Y (i))}i [N]' in a theoretical context but does not mention any specific publicly available datasets or provide access information for any datasets used for experimental evaluation.
Dataset Splits No The paper is theoretical and does not conduct experiments involving datasets, thus there are no dataset splits mentioned.
Hardware Specification No The paper is theoretical and focuses on statistical and computational limits, not empirical results. Therefore, no specific hardware used for running experiments is mentioned.
Software Dependencies No The paper is theoretical and does not describe any practical implementation or experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper focuses on theoretical analysis, proving universality, memorization capacity, and computational limits. It does not describe an experimental setup with hyperparameters, training configurations, or system-level settings.