Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Authors: Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, Neel Nanda

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. ... We use Gemma Scope (Lieberum et al., 2024)...and find internal representations that suggest to encode knowledge awareness in Gemma 2 2B and 9B. ... We use a test set sample of 100 questions about unknown entities, and measure the number of times the model refuses by steering (as in Equation (4)). ... To do the analysis, we perform activation patching (Geiger et al., 2020; Vig et al., 2020; Meng et al., 2022a) on the residual streams and attention heads outputs...
Researcher Affiliation	Academia	Javier Ferrando1,2 Oscar Obeso3 Senthooran Rajamanoharan Neel Nanda 1U. Politècnica de Catalunya 2Barcelona Supercomputing Center 3ETH Zürich
Pseudocode	No	The paper describes methods through mathematical equations (e.g., Equation (1), (2), (4), (6), (9), (12)) and narrative text but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	1We make the codebase available at https://github.com/javiferran/sae_entities.
Open Datasets	Yes	To study how language models reflect knowledge awareness about entities, we build a dataset with four different entity types: (basketball) players, movies, cities, and songs from Wikidata (Vrandeˇci c & Krötzsch, 2024).
Dataset Splits	Yes	Finally, we split the entities into train/validation/test (50%, 10%, 40%) sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory specifications used for running the experiments. It only mentions using Gemma 2 2B, 9B, and Llama 3.1 8B models.
Software Dependencies	No	The paper mentions using "Gemma Scope" and "Llama Scope" which are suites of Sparse Autoencoders, and references "Neuronpedia". However, it does not provide specific version numbers for these tools or any other key software libraries like PyTorch, TensorFlow, or Python.
Experiment Setup	Yes	We use a validation set to select an appropriate steering coefficient α. ... We select α [400, 550], which corresponds to around two times the norm of the residual stream in the layers where the entity recognition latents are present (Appendix E).