Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Demystifying Singular Defects in Large Language Models

Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate these ﬁndings on a variety of LLMs, including LLa MA2 (Touvron et al., 2023), Phi3 (Abdin et al., 2024), MPT (Team, 2023), Pythia (Biderman et al., 2023), Vicuna1.5 (Platzer & Puschner, 2021), Falcon2 (Malartic et al., 2024), GPT2 (Radford et al., 2019), Qwen2.5 (Team, 2024), to name a few.
Researcher Affiliation	Academia	1School of Computer and Communication Sciences, EPFL, Switzerland 2University of Chinese Academy of Sciences, China 3Swiss Data Science Center, Switzerland.
Pseudocode	No	The paper describes methods using mathematical formulations and textual explanations, but it does not contain any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Code is released at https://github. com/haoqiwang/singular_defect.
Open Datasets	Yes	Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer.
Dataset Splits	Yes	Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct its experiments or analysis.
Software Dependencies	No	The paper mentions various LLMs and quantization techniques but does not specify the versions of key software libraries (e.g., Python, PyTorch, CUDA) used for its own experimental setup.
Experiment Setup	No	The paper describes the analytical methodology and observed phenomena in LLMs. While it discusses aspects like quantization strategies, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for reproducing experiments.