Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Authors: Kai Hu, Weichen Yu, Yining Li, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Zhiqiang Shen, Kai Chen, Matt Fredrikson
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our method is more effective and ef๏ฌcient than state-of-the-art token-level methods. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University, 2Shanghai AI Laboratory,3Mohamed bin Zayed University of AI |
| Pseudocode | Yes | Algorithm 1 Transform a vector to be in P and be S-sparsity |
| Open Source Code | Yes | The code is available at https://github.com/hukkai/adc_llm_attack. |
| Open Datasets | Yes | Adv Bench harmful behaviors subset [46] contains 520 harmful behavior requests. |
| Dataset Splits | No | The paper evaluates its attack method on predefined datasets (Adv Bench, Harm Bench) by attempting to jailbreak LLMs, but it does not specify traditional training, validation, or test splits for its own method's optimization process, as its method is an attack generation algorithm rather than a model trained on a dataset. |
| Hardware Specification | Yes | the wall-clock time is the average real-time elapsed per sample on a single A100 machine. |
| Software Dependencies | No | The paper does not specify the version numbers of software dependencies (e.g., programming languages, libraries, or frameworks) used to implement and run its experiments. |
| Experiment Setup | Yes | In all our experiments, we employ the momentum optimizer with a learning rate of 10 and a momentum of 0.99, and do not adjust them during the optimization. |