Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stealth edits to large language models
Authors: Oliver Sutton, Qinghua Zhou, Wei Wang, Desmond Higham, Alexander N Gorban, Alexander Bastounis, Ivan Tyukin
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results illustrate and support our methods and their theoretical underpinnings. Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits. |
| Researcher Affiliation | Academia | Oliver J. Sutton King s College London EMAIL Qinghua Zhou King s College London EMAIL Wei Wang University of Leicester EMAIL Desmond J. Higham University of Edinburgh EMAIL Alexander N. Gorban University of Leicester EMAIL Alexander Bastounis King s College London EMAIL Ivan Y. Tyukin King s College London EMAIL |
| Pseudocode | Yes | Algorithm 1: An in-place edit to correct a hallucination in a language model |
| Open Source Code | Yes | Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits. |
| Open Datasets | Yes | Our experiments require a source of hallucinations to edit, which we draw from the Multi-Counter Fact (MCF) [26] and Zs RE [27] datasets. |
| Dataset Splits | No | The paper describes sampling prompts from datasets (MCF, Zs RE, Wikipedia) but does not provide explicit training, validation, or test set percentages, counts, or predefined splits for these datasets. |
| Hardware Specification | Yes | All models can fit any GPU with 24G VRAM. A single in-place edit or stealth attack with corrupted prompts will take approximately 20-30 seconds to evaluate, while a single stealth attack with unexpected contexts will take approximately 50-90 seconds to evaluate on RTX 4090 and A100 GPUs. |
| Software Dependencies | No | The paper mentions using the 'nlpaug' package and its own 'stealth-edits' package, but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All experiments used θ = 0.005, α = θ 1 and = 50. The impact of different values of θ is investigated in Section C.5. |