Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Understanding the Universality of Transformers for Next-Token Prediction

Authors: Michael Sander, Gabriel Peyré

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experimental results that validate our theoretical findings and suggest their applicability to more general mappings f. In Section 5, we first present experimental results that validate our theoretical findings and extend them to a more general class of mappings f beyond those studied in Sections 3 and 4. The paper includes a dedicated section titled 'EXPERIMENTS' where empirical results are presented and discussed, including error curves and training details.
Researcher Affiliation	Academia	Michaël E. Sander & Gabriel Peyré Ecole Normale Supérieure, CNRS Paris, France EMAIL, EMAIL. Both authors are affiliated with academic institutions (Ecole Normale Supérieure, CNRS), and their email domains (.polytechnique.org, .ens.fr) correspond to academic institutions.
Pseudocode	No	The paper describes methods mathematically and conceptually (e.g., equation (7) for causal kernel descent and Figure 1 for illustration), but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Our code will be open-sourced. This statement indicates a future intention to release code, but does not provide concrete access to source code at the time of publication.
Open Datasets	No	We take d = 15, n = 6, and consider instance (2) with randomly generated Ω s and x1 s, for a dataset with 212 elements, that we split into train, validation, and test sets with respective sizes of 60%, 20%, and 20% of the original dataset. The dataset used was generated by the authors ('randomly generated') and no public access information (link, DOI, repository, or citation) is provided.
Dataset Splits	Yes	We take d = 15, n = 6, and consider instance (2) with randomly generated Ω s and x1 s, for a dataset with 212 elements, that we split into train, validation, and test sets with respective sizes of 60%, 20%, and 20% of the original dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014)' for training but does not provide specific version numbers for any software, libraries, or frameworks used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We take d = 15, n = 6, and consider instance (2)... We train the model using Adam (Kingma & Ba, 2014) on the Mean Squared Error (MSE) loss for next-token prediction on sequences of length T = 100... We train for 5000 epochs with early stopping.