Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

Authors: Yuchen Li, Yuanzhi Li, Andrej Risteski

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Precisely, we show, through a combination of mathematical analysis and experiments on Wikipedia data and synthetic data modeled by Latent Dirichlet Allocation (LDA), that the embedding layer and the self-attention layer encode the topical structure. We analyze properties of the training dynamics via extensive experimental analysis.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Microsoft Research. Correspondence to: Yuchen Li <EMAIL>.
Pseudocode	No	The paper describes mathematical analyses and experimental procedures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at https://github.com/ Yuchen Li01/transformer_topic_model_LDA
Open Datasets	Yes	We focus on understanding the optimization dynamics of transformers in a simple sandbox: a single-layer transformer trained on (synthetic) data following a topic model distribution and validate that our results robustly transfer to real data (Wikipedia Wikimedia Foundation, 2023). In our synthetic data experiments, we use a finite N and generate data using an LDA model (Blei et al., 2003)
Dataset Splits	No	The paper mentions using 'synthetic data' and 'Wikipedia data' for experiments but does not explicitly provide details about train/validation/test splits (e.g., specific percentages, sample counts, or references to predefined splits) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not mention any specific hardware components (e.g., GPU models, CPU types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Huggingface (Wolf et al., 2020)' for pre-trained models and tokenizers, and refers to 'standard implementation in Wolf et al. (2020)'. However, it does not provide specific version numbers for any software libraries or dependencies (e.g., PyTorch version, Transformers library version).
Experiment Setup	Yes	In our experiments, we generate data following Section 3.1 with T = 10, v = 10, N uniformly randomly chosen from [100, 150]... Our training objective follows Section 3.2 with pm = 0.15, pc = 0.1, pr = 0.1 following Devlin et al. (2019). We use the model architecture following Section 3.3 but add back the bias terms b K, b Q, b V , following standard implementation in Wolf et al. (2020).