Exploiting LLM Quantization
Authors: Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the feasibility and severity of such an attack across three diverse scenarios: vulnerable code generation, content injection, and over-refusal attack. |
| Researcher Affiliation | Academia | Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev Department of Computer Science ETH Zurich kegashira@ethz.ch {mark.vero,robin.staab,jingxuan.he,martin.vechev}@inf.ethz.ch |
| Pseudocode | No | No structured pseudocode or algorithm blocks explicitly labeled as such were found. |
| Open Source Code | Yes | 1Code available at: https://github.com/eth-sri/llm-quantization-attack |
| Open Datasets | Yes | For Dinstr., we used the Code-Alpaca dataset. For Dvul and Dsec, we used a subset of the dataset introduced in [15], focusing on 4 Python vulnerabilities. Following He and Vechev [15], we run the static-analyzer-based evaluation method on the test cases that correspond to the tuned vulnerabilities, and we report the percentage of code completions without security vulnerabilities as Code Security. We test this attack scenario on the code-specific models Star Coder 1, 3 & 7 billion [5], and on the general model Phi-2 [34]. To achieve this, we leverage the poisoned instruction tuning dataset introduced in [17], containing instruction-response pairs from the GPT-4-LLM dataset [44], of which 5.2k are modified to contain refusals to otherwise harmless questions. We evaluate this on 1.5k instructions from the databricks-15k dataset [20], |
| Dataset Splits | No | The paper uses standard benchmarks (MMLU, TruthfulQA, HumanEval, MBPP) which have predefined evaluation setups. It also uses datasets like Code-Alpaca, dataset introduced in [15], GPT-4-LLM, and databricks-15k for training and evaluation. However, it does not explicitly state the specific train/validation/test splits applied to these datasets for their own experimental procedures (e.g., for fine-tuning or removal phases). |
| Hardware Specification | Yes | All experiments on the paper were conducted on either an H100 (80GB) or an 8x A100 (40GB) compute node. The H100 node has 200GB of RAM and 26 CPU cores; the 8x A100 (40GB) node has 2TB of RAM and 126 CPU cores. |
| Software Dependencies | No | The paper mentions software like Adam [48] (optimizer), Hugging Face Transformers [7], GPT-4 [2] judge, and GitHub CodeQL [49], but does not provide specific version numbers for these software components or other libraries that would be necessary for full reproducibility. |
| Experiment Setup | Yes | We perform instruction tuning for 1 epoch for injection and 2 epochs for removal with PGD, using a learning rate of 2e-5 for both. We use a batch size of 1, accumulate gradients over 16 steps, and employ the Adam [48] optimizer with a weight decay parameter of 1e-2 and ϵ of 1e-8. We clip the accumulated gradients to have norm 1. |