VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Authors: Zhuofan Ying, Peter Hase, Mohit Bansal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on three benchmark datasets: CLEVR-XAI [5], GQA [21], and VQA-HAT [11].
Researcher Affiliation Academia Department of Computer Science University of North Carolina at Chapel Hill {zfying, peter, mbansal}@cs.unc.edu
Pseudocode No The paper describes methods using mathematical equations and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All supporting code for experiments in this paper is available at https://github.com/zfying/visfis.
Open Datasets Yes We perform experiments on three benchmark datasets: CLEVR-XAI [5], GQA [21], and VQA-HAT [11].
Dataset Splits Yes Table 1: Dataset split sizes. Dataset Train Dev Test-ID Test-OOD CLEVR-XAI 83k 14k 21k 22k GQA-101k 101k 20k 20k 20k VQA-HAT 36k 6k 9k 9k
Hardware Specification Yes We use one NVIDIA A100 GPU for training.
Software Dependencies No The paper mentions using 'Py Torch' but does not specify version numbers for any software dependencies.
Experiment Setup Yes All models are trained with Adam [29] optimizer with a learning rate of 1e-4, except for LXMERT which is 1e-5. We train each model for 20 epochs with a batch size of 64.