Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of RAINBOW TEAMING through extensive experiments targeting several state-of-the-art LLMs fine-tuned on safety-aligned data, including the Llama 2-chat [65] and Llama 3-Instruct [1] models. |
| Researcher Affiliation | Collaboration | Mikayel Samvelyan 1,2 Sharath Chandra Raparthy 1 Andrei Lupu 1,3 Eric Hambro1 Aram H. Markosyan1 Manish Bhatt1 Yuning Mao1 Minqi Jiang1,2 Jack Parker-Holder2 Jakob Foerster1,3 Tim Rocktäschel2 Roberta Raileanu1,2 1Meta 2University College London 3University of Oxford |
| Pseudocode | Yes | Algorithm 1 in Appendix B provides the pseudocode of this method." and "Algorithm 2 in Appendix B provides the pseudocode of our method. |
| Open Source Code | No | While we have not open-sourced our code or our synthetic data alongside our paper, we are assessing the safety and legal concerns of doing so at a future date. |
| Open Datasets | Yes | Table 11: List of hyperparameters used in question answering experiments. Experiments Hyperparameter Value RAINBOW TEAMING Number of Initial Examples 256 Dataset of Initial Examples Trivia QA [30] |
| Dataset Splits | Yes | We perform a 12/3 train-test split and use Llama 2-chat 70B with a handcrafted system prompt to generate safe refusal prompts for the train set. |
| Hardware Specification | Yes | We conducted our experiments on a cluster of A100 GPUs, with access ranging from 128 to 256 GPUs throughout the project. |
| Software Dependencies | No | The paper mentions various LLMs and tools used (e.g., "Llama 2-chat 7B", "Llama 3-Instruct 8B", "GPT-4", "Llama Guard"), but does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 10: List of hyperparameters used in safety experiments. Experiments Hyperparameter Value RAINBOW TEAMING Number of Initial Examples 0 Batch Size 32 Iterations 2000 BLEU Similarity Filter 0.6 Archive Sampling Temperature 0.1 Archive Size 100 Generator Parameters Temperature 0.7 Top-k 0.95 Maximum Tokens 256 Learning Rate 2e 7 Batch Size 32 Learning Rate Scheduler Constant Sequence Length 4096 |