reproducibilityindex.ai

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Authors: Kai Hu, Weichen Yu, Yining Li, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Zhiqiang Shen, Kai Chen, Matt Fredrikson

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method is more effective and efﬁcient than state-of-the-art token-level methods.
Researcher Affiliation	Academia	1Carnegie Mellon University, 2Shanghai AI Laboratory,3Mohamed bin Zayed University of AI
Pseudocode	Yes	Algorithm 1 Transform a vector to be in P and be S-sparsity
Open Source Code	Yes	The code is available at https://github.com/hukkai/adc_llm_attack.
Open Datasets	Yes	Adv Bench harmful behaviors subset [46] contains 520 harmful behavior requests.
Dataset Splits	No	The paper evaluates its attack method on predefined datasets (Adv Bench, Harm Bench) by attempting to jailbreak LLMs, but it does not specify traditional training, validation, or test splits for its own method's optimization process, as its method is an attack generation algorithm rather than a model trained on a dataset.
Hardware Specification	Yes	the wall-clock time is the average real-time elapsed per sample on a single A100 machine.
Software Dependencies	No	The paper does not specify the version numbers of software dependencies (e.g., programming languages, libraries, or frameworks) used to implement and run its experiments.
Experiment Setup	Yes	In all our experiments, we employ the momentum optimizer with a learning rate of 10 and a momentum of 0.99, and do not adjust them during the optimization.