Stealth edits to large language models
Authors: Oliver Sutton, Qinghua Zhou, Wei Wang, Desmond Higham, Alexander N Gorban, Alexander Bastounis, Ivan Tyukin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results illustrate and support our methods and their theoretical underpinnings. Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits. |
| Researcher Affiliation | Academia | Oliver J. Sutton King s College London oliver.sutton@kcl.ac.uk Qinghua Zhou King s College London qinghua.zhou@kcl.ac.uk Wei Wang University of Leicester ww152@le.ac.uk Desmond J. Higham University of Edinburgh d.j.higham@ed.ac.uk Alexander N. Gorban University of Leicester a.n.gorban@le.ac.uk Alexander Bastounis King s College London alexander.bastounis@kcl.ac.uk Ivan Y. Tyukin King s College London ivan.tyukin@kcl.ac.uk |
| Pseudocode | Yes | Algorithm 1: An in-place edit to correct a hallucination in a language model |
| Open Source Code | Yes | Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits. |
| Open Datasets | Yes | Our experiments require a source of hallucinations to edit, which we draw from the Multi-Counter Fact (MCF) [26] and Zs RE [27] datasets. |
| Dataset Splits | No | The paper describes sampling prompts from datasets (MCF, Zs RE, Wikipedia) but does not provide explicit training, validation, or test set percentages, counts, or predefined splits for these datasets. |
| Hardware Specification | Yes | All models can fit any GPU with 24G VRAM. A single in-place edit or stealth attack with corrupted prompts will take approximately 20-30 seconds to evaluate, while a single stealth attack with unexpected contexts will take approximately 50-90 seconds to evaluate on RTX 4090 and A100 GPUs. |
| Software Dependencies | No | The paper mentions using the 'nlpaug' package and its own 'stealth-edits' package, but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All experiments used θ = 0.005, α = θ 1 and = 50. The impact of different values of θ is investigated in Section C.5. |