Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stealth edits to large language models

Authors: Oliver Sutton, Qinghua Zhou, Wei Wang, Desmond Higham, Alexander N Gorban, Alexander Bastounis, Ivan Tyukin

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results illustrate and support our methods and their theoretical underpinnings. Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits.
Researcher Affiliation	Academia	Oliver J. Sutton King s College London EMAIL Qinghua Zhou King s College London EMAIL Wei Wang University of Leicester EMAIL Desmond J. Higham University of Edinburgh EMAIL Alexander N. Gorban University of Leicester EMAIL Alexander Bastounis King s College London EMAIL Ivan Y. Tyukin King s College London EMAIL
Pseudocode	Yes	Algorithm 1: An in-place edit to correct a hallucination in a language model
Open Source Code	Yes	Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits.
Open Datasets	Yes	Our experiments require a source of hallucinations to edit, which we draw from the Multi-Counter Fact (MCF) [26] and Zs RE [27] datasets.
Dataset Splits	No	The paper describes sampling prompts from datasets (MCF, Zs RE, Wikipedia) but does not provide explicit training, validation, or test set percentages, counts, or predefined splits for these datasets.
Hardware Specification	Yes	All models can fit any GPU with 24G VRAM. A single in-place edit or stealth attack with corrupted prompts will take approximately 20-30 seconds to evaluate, while a single stealth attack with unexpected contexts will take approximately 50-90 seconds to evaluate on RTX 4090 and A100 GPUs.
Software Dependencies	No	The paper mentions using the 'nlpaug' package and its own 'stealth-edits' package, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	All experiments used θ = 0.005, α = θ 1 and = 50. The impact of different values of θ is investigated in Section C.5.