LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

Authors: Viraj Prabhu, Sriram Yenamandra, Prithvijit Chattopadhyay, Judy Hoffman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark the performance of a diverse set of pretrained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in Image Net. (Abstract) In Section 4.1, we overview our experimental setup, describing the data, metrics, baselines, and implementation details used. Next, we present our results (Section 4.2), comparing the performance of a diverse set of pretrained models on the subset of the Image Net test set, and on our generated counterfactual test sets. (Section 4 introduction)
Researcher Affiliation Academia Viraj Prabhu Sriram Yenamandra Prithvijit Chattopadhyay Judy Hoffman Georgia Institute of Technology {virajp,sriramy,prithvijit3,judy}@gatech.edu
Pseudocode Yes Algorithm 1 Generating Language-guided Counterfactual Images (Page 5)
Open Source Code Yes Code: https://github.com/virajprabhu/lance. (Abstract)
Open Datasets Yes Dataset. We evaluate LANCE on a subset of the Image Net [2] validation set. (Section 4.1) All source images belong to the Image Net dataset [2], which is distributed under a BSD-3 license that permits research and commercial use. (Appendix A)
Dataset Splits Yes We evaluate LANCE on a subset of the Image Net [2] validation set. Specifically, we study the 15 classes included in the Hard Image Net benchmark [62]. ... We consider the original Image Net validation sets for these 15 classes, with 50 images/class, as our base set. (Section 4.1)
Hardware Specification Yes We run all experiments on a single NVIDIA A40 GPU. (Appendix F)
Software Dependencies Yes We use Py Torch [65] for all experiments. (Section 4.1) Stable Diffusion [16] version 1.4 (Table 6, Image Editing)
Experiment Setup Yes We include additional implementation details for hyperparameters used by LANCE for caption and image editing in Table 6. (Appendix F) Table 6: Hyperparameter values used for caption (top left), LLAMA finetuning (top right) and image editing (bottom). (Page 13)