reproducibilityindex.ai

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Authors: Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically evaluate the performance of these image hijacks under ℓ -norm and patch constraints, and find that state-of-the-art text based adversaries underperform image hijacks. (Section 4).
Researcher Affiliation	Academia	1Harvard University 2Cambridge University 3University of California, Berkeley. Correspondence to: Scott Emmons <emmons@berkeley.edu>.
Pseudocode	No	The paper describes algorithms in text and provides diagrams (e.g., Figure 2 for Behaviour Matching algorithm) but does not include pseudocode blocks.
Open Source Code	No	The paper does not provide a direct link to open-source code for the methodology. It mentions Open AI's GPT-4, Google's Gemini, and other open-source models, but not its own code release.
Open Datasets	Yes	For our training context set C, we used the instructions from the Alpaca training set (Taori et al., 2023), a dataset of 52,000 instruction-output pairs generated from Open AI s text-davinci-003.
Dataset Splits	Yes	For our training context set C, we used the instructions from the Alpaca training set (Taori et al., 2023), a dataset of 52,000 instruction-output pairs generated from Open AI s text-davinci-003. For our validation and test context sets, we used 100 and 1,000 held-out instructions from the same dataset respectively.
Hardware Specification	Yes	We trained for a maximum of 12 hours on an NVIDIA A100-SXM4-80GB GPU, identified the checkpoint with the highest validation success rate, and reported the test set results using this checkpoint.
Software Dependencies	No	The paper mentions 'LLa VA LLa MA-2-13B-Chat model', 'CLIP Vi T-L/14 vision encoder', 'LLa MA-2-13b Chat language model', 'Lang Chain', 'GPT-3.5-turbo LLM', and 'Pillow Python package' but does not specify version numbers for any of these software components.
Experiment Setup	Yes	We trained all specific string image hijacks with stochastic gradient descent, using a learning rate of 3 for patch-based attacks and 0.03 for all other attacks. For our training context set C, we used the instructions from the Alpaca training set (Taori et al., 2023), a dataset of 52,000 instruction-output pairs generated from Open AI s text-davinci-003.