Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Authors: Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically evaluate the performance of these image hijacks under ℓ -norm and patch constraints, and find that state-of-the-art text based adversaries underperform image hijacks. (Section 4). |
| Researcher Affiliation | Academia | 1Harvard University 2Cambridge University 3University of California, Berkeley. Correspondence to: Scott Emmons <emmons@berkeley.edu>. |
| Pseudocode | No | The paper describes algorithms in text and provides diagrams (e.g., Figure 2 for Behaviour Matching algorithm) but does not include pseudocode blocks. |
| Open Source Code | No | The paper does not provide a direct link to open-source code for the methodology. It mentions Open AI's GPT-4, Google's Gemini, and other open-source models, but not its own code release. |
| Open Datasets | Yes | For our training context set C, we used the instructions from the Alpaca training set (Taori et al., 2023), a dataset of 52,000 instruction-output pairs generated from Open AI s text-davinci-003. |
| Dataset Splits | Yes | For our training context set C, we used the instructions from the Alpaca training set (Taori et al., 2023), a dataset of 52,000 instruction-output pairs generated from Open AI s text-davinci-003. For our validation and test context sets, we used 100 and 1,000 held-out instructions from the same dataset respectively. |
| Hardware Specification | Yes | We trained for a maximum of 12 hours on an NVIDIA A100-SXM4-80GB GPU, identified the checkpoint with the highest validation success rate, and reported the test set results using this checkpoint. |
| Software Dependencies | No | The paper mentions 'LLa VA LLa MA-2-13B-Chat model', 'CLIP Vi T-L/14 vision encoder', 'LLa MA-2-13b Chat language model', 'Lang Chain', 'GPT-3.5-turbo LLM', and 'Pillow Python package' but does not specify version numbers for any of these software components. |
| Experiment Setup | Yes | We trained all specific string image hijacks with stochastic gradient descent, using a learning rate of 3 for patch-based attacks and 0.03 for all other attacks. For our training context set C, we used the instructions from the Alpaca training set (Taori et al., 2023), a dataset of 52,000 instruction-output pairs generated from Open AI s text-davinci-003. |