Stealth edits to large language models

Authors: Oliver Sutton, Qinghua Zhou, Wei Wang, Desmond Higham, Alexander N Gorban, Alexander Bastounis, Ivan Tyukin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results illustrate and support our methods and their theoretical underpinnings. Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits.
Researcher Affiliation Academia Oliver J. Sutton King s College London oliver.sutton@kcl.ac.uk Qinghua Zhou King s College London qinghua.zhou@kcl.ac.uk Wei Wang University of Leicester ww152@le.ac.uk Desmond J. Higham University of Edinburgh d.j.higham@ed.ac.uk Alexander N. Gorban University of Leicester a.n.gorban@le.ac.uk Alexander Bastounis King s College London alexander.bastounis@kcl.ac.uk Ivan Y. Tyukin King s College London ivan.tyukin@kcl.ac.uk
Pseudocode Yes Algorithm 1: An in-place edit to correct a hallucination in a language model
Open Source Code Yes Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits.
Open Datasets Yes Our experiments require a source of hallucinations to edit, which we draw from the Multi-Counter Fact (MCF) [26] and Zs RE [27] datasets.
Dataset Splits No The paper describes sampling prompts from datasets (MCF, Zs RE, Wikipedia) but does not provide explicit training, validation, or test set percentages, counts, or predefined splits for these datasets.
Hardware Specification Yes All models can fit any GPU with 24G VRAM. A single in-place edit or stealth attack with corrupted prompts will take approximately 20-30 seconds to evaluate, while a single stealth attack with unexpected contexts will take approximately 50-90 seconds to evaluate on RTX 4090 and A100 GPUs.
Software Dependencies No The paper mentions using the 'nlpaug' package and its own 'stealth-edits' package, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes All experiments used θ = 0.005, α = θ 1 and = 50. The impact of different values of θ is investigated in Section C.5.