Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Authors: Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through evaluations of edited models on this task, we show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity. We further corroborate our findings in naturalistic settings with pre-trained Llama and Mamba models as well. |
| Researcher Affiliation | Collaboration | 1Harvard College 2CBS-NTT Program in Physics of Intelligence, Harvard University 3Physics and Informatics Lab, NTT Research Inc. 4Computer and Information Science, University of Pennsylvania 5Department of Physics, Massachusetts Institute of Technology. |
| Pseudocode | Yes | Algorithm 1: Generate a single sequence containing a collection of facts. |
| Open Source Code | Yes | Please find the source code for our experiments at github.com/Kento_Nishi/KE-ICML-2025. |
| Open Datasets | Yes | To quantify model performance before and after editing, we adopt the MMLU-Redux reasoning benchmark (Gema et al., 2024) with the Zero Eval prompting framework (Lin, 2024) to elicit chain-of-thought reasoning. |
| Dataset Splits | No | The paper defines concepts like 'edit sub-graph,' 'retain sub-graph,' and 'test sub-graph' for knowledge editing. It also mentions that certain facts are 'held out' for logical and compositional inference tasks. Additionally, for ROME, it states: 'The covariance matrix C is estimated by randomly sampling 10^5 inputs from the validation dataset.' However, specific percentages or absolute sample counts for the main training/validation/test splits of the synthetic data generated are not provided, nor are explicit split details for the MMLU-Redux benchmark used in the LLM experiments. |
| Hardware Specification | No | The paper states: 'For all experiments (unless stated otherwise), we use a 2-layer nano GPT Transformer (Karpathy, 2021).' It also mentions using 'pre-trained Llama and Mamba models.' However, no specific GPU models, CPU models, or other hardware specifications used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions: 'Our Transformer model is a fork of the open-source nano GPT repository (https://github.com/karpathy/nano_GPT).' It also states: 'The value optimization is performed using the Adam optimizer, with hyperparameters lr = 10^-3 and weight decay = 10^-4.' While these refer to software components and tools, specific version numbers for these software dependencies (e.g., Python version, PyTorch version, nano GPT version) are not explicitly provided in the text. |
| Experiment Setup | Yes | We train a Transformer model using next-token prediction on the synthetic data generated from the above data generation process. For all experiments (unless stated otherwise), we use a 2-layer nano GPT Transformer (Karpathy, 2021). Batch size: 256 Context length: 16 Optimizer: Adam Learning rate: 6 × 10−4 Training epochs: 1.5 × 10^5 Decay iterations: 1.5 × 10^5 Momentum: β1 = 0.9, β2 = 0.95 Activation function: GeLU Block size: 16 Embedding dimensions: 24 Heads: 12 |