reproducibilityindex.ai

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of RAINBOW TEAMING through extensive experiments targeting several state-of-the-art LLMs fine-tuned on safety-aligned data, including the Llama 2-chat [65] and Llama 3-Instruct [1] models.
Researcher Affiliation	Collaboration	Mikayel Samvelyan 1,2 Sharath Chandra Raparthy 1 Andrei Lupu 1,3 Eric Hambro1 Aram H. Markosyan1 Manish Bhatt1 Yuning Mao1 Minqi Jiang1,2 Jack Parker-Holder2 Jakob Foerster1,3 Tim Rocktäschel2 Roberta Raileanu1,2 1Meta 2University College London 3University of Oxford
Pseudocode	Yes	Algorithm 1 in Appendix B provides the pseudocode of this method." and "Algorithm 2 in Appendix B provides the pseudocode of our method.
Open Source Code	No	While we have not open-sourced our code or our synthetic data alongside our paper, we are assessing the safety and legal concerns of doing so at a future date.
Open Datasets	Yes	Table 11: List of hyperparameters used in question answering experiments. Experiments Hyperparameter Value RAINBOW TEAMING Number of Initial Examples 256 Dataset of Initial Examples Trivia QA [30]
Dataset Splits	Yes	We perform a 12/3 train-test split and use Llama 2-chat 70B with a handcrafted system prompt to generate safe refusal prompts for the train set.
Hardware Specification	Yes	We conducted our experiments on a cluster of A100 GPUs, with access ranging from 128 to 256 GPUs throughout the project.
Software Dependencies	No	The paper mentions various LLMs and tools used (e.g., "Llama 2-chat 7B", "Llama 3-Instruct 8B", "GPT-4", "Llama Guard"), but does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Table 10: List of hyperparameters used in safety experiments. Experiments Hyperparameter Value RAINBOW TEAMING Number of Initial Examples 0 Batch Size 32 Iterations 2000 BLEU Similarity Filter 0.6 Archive Sampling Temperature 0.1 Archive Size 100 Generator Parameters Temperature 0.7 Top-k 0.95 Maximum Tokens 256 Learning Rate 2e 7 Batch Size 32 Learning Rate Scheduler Constant Sequence Length 4096