Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Looped Transformers are Better at Learning Learning Algorithms
Authors: Liu Yang, Kangwook Lee, Robert D Nowak, Dimitris Papailiopoulos
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10% of the parameter count. |
| Researcher Affiliation | Academia | Liu Yang, Kangwook Lee, Robert D. Nowak & Dimitris Papailiopoulos University of Wisconsin, Madison, USA EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code is available at https://github.com/Leiay/looped_transformer. |
| Open Datasets | Yes | we have conducted additional experiments using 10 datasets from Open ML (Vanschoren et al., 2013) |
| Dataset Splits | Yes | During training, we uniformly sampled prompts from 9 datasets, where for each prompt, we first randomly selected a training set, then randomly selected k + 1 samples from this training set, with k being the number of in-context samples. During testing, we applied a similar approach for each test sample, selecting k in-context samples from the test dataset, with care taken to exclude the test sample itself from these in-context pairs. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments with specific models or specifications. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'GPT-2 decoder model' but does not specify versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Specifically, we employ a GPT-2 model with an embedding dimension of D = 256 and h = 8 attention heads. The standard (unlooped) transformer has L = 12 layers, and the looped transformer has L = 1 layer. ... train with Adam optimizer, learning rate 0.0001, no weight decay or other explicit regularization... we adopt b = 20 and T = 15 for the linear regression task. |