Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Authors: Zhuofan Ying, Peter Hase, Mohit Bansal

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on three benchmark datasets: CLEVR-XAI [5], GQA [21], and VQA-HAT [11].
Researcher Affiliation Academia Department of Computer Science University of North Carolina at Chapel Hill EMAIL
Pseudocode No The paper describes methods using mathematical equations and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All supporting code for experiments in this paper is available at https://github.com/zfying/visfis.
Open Datasets Yes We perform experiments on three benchmark datasets: CLEVR-XAI [5], GQA [21], and VQA-HAT [11].
Dataset Splits Yes Table 1: Dataset split sizes. Dataset Train Dev Test-ID Test-OOD CLEVR-XAI 83k 14k 21k 22k GQA-101k 101k 20k 20k 20k VQA-HAT 36k 6k 9k 9k
Hardware Specification Yes We use one NVIDIA A100 GPU for training.
Software Dependencies No The paper mentions using 'Py Torch' but does not specify version numbers for any software dependencies.
Experiment Setup Yes All models are trained with Adam [29] optimizer with a learning rate of 1e-4, except for LXMERT which is 1e-5. We train each model for 20 epochs with a batch size of 64.