Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prompt Tuning Transformers for Data Memorization

Authors: Haiyu Wang, Yuanyuan Lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we provide both theoretical and empirical analyses of data memorization ability of prompt-tuned Transformers. Building on recent theoretical frameworks, we derive an upper bound on the required prompt length for exact memorization of ﬁnite datasets and establish a trade-off between prompt length and the number of autoregressive generation steps. Speciﬁcally, we show that a constant-size Transformer can memorize n input-output pairs with prompts of length O( n N), where N denotes the sequence length. Empirical results further demonstrate that prompt-tuned, randomly initialized Transformers are able to effectively memorize ﬁnite datasets. These models also capture the intrinsic low-rank structure of the data, leading to a reduction in the required prompt length. Finally, we analyze how the initialization of the Transformer backbone affects the performance of prompt tuning. Our ﬁndings provide new insights into the expressivity, efﬁciency, and underlying mechanisms of prompt tuning, bridging theoretical memorization limits with observed empirical behaviors.
Researcher Affiliation	Academia	Haiyu Wang Department of Statistics and Data Science The Chinese University of Hong Kong Hai Yu EMAIL Yuanyuan Lin Department of Statistics and Data Science The Chinese University of Hong Kong EMAIL
Pseudocode	No	The paper defines mathematical formulations of Transformer components like self-attention and feed-forward layers, and provides formal definitions for concepts like autoregressive generation and prompt tuning. However, it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The code will be publically available upon acceptance.
Open Datasets	Yes	The data points to be memorized are randomly sampled from the IMDb [Maas et al., 2011] dataset. ... We randomly sample 1000 samples from SST-2 dataset [Socher et al., 2013], which are truncated to a length of 8.
Dataset Splits	No	For dataset sizes of 1600, 2500, and 3600. ... The training dataset size is 2000 and test dataset size is 200. While specific sizes are mentioned, the paper does not provide explicit percentages or methodologies for how the training, validation, and test splits were created for all experiments, nor does it cite standard splits consistently for all datasets used.
Hardware Specification	Yes	All the experiments are conducted on one NVIDIA T4 GPU.
Software Dependencies	No	Our code is based on standard Py Torch modules. We use the Roberta-base (12 heads and 12 layers) implementation of Hugginface [Wolf et al., 2019]. The paper mentions PyTorch and HuggingFace, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Number of training epochs is 1000, leanring rate is 0.005. Optimizer is Adam W [Loshchilov and Hutter, 2017]. ... Number of training epochs is 100, learning rate is 0.001. Optimizer is Adam W. We use a two-layer randomly initialized Transformer with an embedding size of 512... The input sequence length is 16 where the ﬁrst 8 tokens are prompt tokens and the remaining 8 are data tokens.