Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Demystifying Singular Defects in Large Language Models

Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate these ๏ฌndings on a variety of LLMs, including LLa MA2 (Touvron et al., 2023), Phi3 (Abdin et al., 2024), MPT (Team, 2023), Pythia (Biderman et al., 2023), Vicuna1.5 (Platzer & Puschner, 2021), Falcon2 (Malartic et al., 2024), GPT2 (Radford et al., 2019), Qwen2.5 (Team, 2024), to name a few.
Researcher Affiliation Academia 1School of Computer and Communication Sciences, EPFL, Switzerland 2University of Chinese Academy of Sciences, China 3Swiss Data Science Center, Switzerland.
Pseudocode No The paper describes methods using mathematical formulations and textual explanations, but it does not contain any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Code is released at https://github. com/haoqiwang/singular_defect.
Open Datasets Yes Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer.
Dataset Splits Yes Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct its experiments or analysis.
Software Dependencies No The paper mentions various LLMs and quantization techniques but does not specify the versions of key software libraries (e.g., Python, PyTorch, CUDA) used for its own experimental setup.
Experiment Setup No The paper describes the analytical methodology and observed phenomena in LLMs. While it discusses aspects like quantization strategies, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for reproducing experiments.