reproducibilityindex.ai

Transformers as Algorithms: Generalization and Stability in In-context Learning

Authors: Yingcong Li, Muhammed Emrullah Ildiz, Dimitris Papailiopoulos, Samet Oymak

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.
Researcher Affiliation	Academia	1{yli692,mildi001}ucr.edu, University of California, Riverside. 2dimitris@papail.io, University of Wisconsin, Madison. 3University of Michigan, Ann Arbor. Correspondence to: Samet Oymak <oymak@umich.edu>.
Pseudocode	No	The paper describes algorithms and methods in prose and mathematical notation but does not contain any formal pseudocode blocks or algorithm listings.
Open Source Code	Yes	Our code is available at https://github.com/ yingcong-li/transformers-as-algorithms.
Open Datasets	No	The paper describes using synthetic or generated datasets for experiments (e.g., 'random linear regression tasks', 'linear data with covariance prior', 'partially-observed LDS') and refers to code for generation, but does not provide specific access information (links, DOIs, citations to publicly available versions) for these datasets themselves.
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset split percentages or sample counts. It refers to training and evaluation on tasks, but not in terms of common data splits for reproducibility.
Hardware Specification	No	The paper describes the GPT-2 architecture used but does not provide specific hardware details such as GPU/CPU models, memory, or computational resources used for training or evaluation.
Software Dependencies	No	The paper mentions 'Python 3.8' and 'Adam optimizer' but does not list specific version numbers for other key software libraries or frameworks (e.g., PyTorch, TensorFlow) that would be necessary for reproduction.
Experiment Setup	Yes	All experiments use learning rate 0.0001 and Adam optimizer. For Fig. 2(c) and Fig. 11, we fix the batch size to 64 and train with 500k/100k iterations.