Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Authors: Kai Hu, Weichen Yu, Yining Li, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Zhiqiang Shen, Kai Chen, Matt Fredrikson
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our method is more effective and efficient than state-of-the-art token-level methods. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University, 2Shanghai AI Laboratory,3Mohamed bin Zayed University of AI |
| Pseudocode | Yes | Algorithm 1 Transform a vector to be in P and be S-sparsity |
| Open Source Code | Yes | The code is available at https://github.com/hukkai/adc_llm_attack. |
| Open Datasets | Yes | Adv Bench harmful behaviors subset [46] contains 520 harmful behavior requests. |
| Dataset Splits | No | The paper evaluates its attack method on predefined datasets (Adv Bench, Harm Bench) by attempting to jailbreak LLMs, but it does not specify traditional training, validation, or test splits for its own method's optimization process, as its method is an attack generation algorithm rather than a model trained on a dataset. |
| Hardware Specification | Yes | the wall-clock time is the average real-time elapsed per sample on a single A100 machine. |
| Software Dependencies | No | The paper does not specify the version numbers of software dependencies (e.g., programming languages, libraries, or frameworks) used to implement and run its experiments. |
| Experiment Setup | Yes | In all our experiments, we employ the momentum optimizer with a learning rate of 10 and a momentum of 0.99, and do not adjust them during the optimization. |