Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Prompting a Pretrained Transformer Can Be a Universal Approximator
Authors: Aleksandar Petrov, Philip Torr, Adel Bibi
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of a pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, prefix-tuning a single attention head is sufficient to approximate any continuous function making the attention mechanism uniquely suited for universal approximation. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision. |
| Researcher Affiliation | Academia | Aleksandar Petrov 1 Philip H.S. Torr 1 Adel Bibi 1 1Department of Engineering Science, University of Oxford, UK. |
| Pseudocode | No | The paper presents theoretical results and proofs (e.g., Theorem 1, Lemma 1) but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | This is a theoretical paper focused on mathematical proofs and universal approximation; therefore, it does not provide open-source code for a specific methodology or implementation. |
| Open Datasets | No | This is a theoretical paper that mathematically defines concept classes and universal approximation properties, rather than conducting experiments on datasets. Therefore, no dataset is used or made publicly available for training. |
| Dataset Splits | No | This paper is theoretical and does not involve empirical experiments or dataset splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper that does not report on empirical experiments; therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | This is a theoretical paper that does not report on empirical experiments; therefore, no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | This is a theoretical paper that does not describe any empirical experiments or their setup, including hyperparameters or training configurations. |