LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
Authors: Viraj Prabhu, Sriram Yenamandra, Prithvijit Chattopadhyay, Judy Hoffman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark the performance of a diverse set of pretrained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in Image Net. (Abstract) In Section 4.1, we overview our experimental setup, describing the data, metrics, baselines, and implementation details used. Next, we present our results (Section 4.2), comparing the performance of a diverse set of pretrained models on the subset of the Image Net test set, and on our generated counterfactual test sets. (Section 4 introduction) |
| Researcher Affiliation | Academia | Viraj Prabhu Sriram Yenamandra Prithvijit Chattopadhyay Judy Hoffman Georgia Institute of Technology {virajp,sriramy,prithvijit3,judy}@gatech.edu |
| Pseudocode | Yes | Algorithm 1 Generating Language-guided Counterfactual Images (Page 5) |
| Open Source Code | Yes | Code: https://github.com/virajprabhu/lance. (Abstract) |
| Open Datasets | Yes | Dataset. We evaluate LANCE on a subset of the Image Net [2] validation set. (Section 4.1) All source images belong to the Image Net dataset [2], which is distributed under a BSD-3 license that permits research and commercial use. (Appendix A) |
| Dataset Splits | Yes | We evaluate LANCE on a subset of the Image Net [2] validation set. Specifically, we study the 15 classes included in the Hard Image Net benchmark [62]. ... We consider the original Image Net validation sets for these 15 classes, with 50 images/class, as our base set. (Section 4.1) |
| Hardware Specification | Yes | We run all experiments on a single NVIDIA A40 GPU. (Appendix F) |
| Software Dependencies | Yes | We use Py Torch [65] for all experiments. (Section 4.1) Stable Diffusion [16] version 1.4 (Table 6, Image Editing) |
| Experiment Setup | Yes | We include additional implementation details for hyperparameters used by LANCE for caption and image editing in Table 6. (Appendix F) Table 6: Hyperparameter values used for caption (top left), LLAMA finetuning (top right) and image editing (bottom). (Page 13) |