Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

Authors: Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the two challenging multi-modal datasets VQA-CPv2 and Social IQ, we obtain state-of-the-art results while more uniformly exploiting the modalities. In addition, we demonstrate the efficacy of our method on Colored MNIST.
Researcher Affiliation Academia Itai Gat Technion Idan Schwartz Technion Alexander Schwing UIUC Tamir Hazan Technion
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Dataset: Colored MNIST [5, 6] is a synthetic dataset based on MNIST [45]. VQA-CPv2 [1] is a re-shuffle of the VQAv2 [46] dataset. The Social IQ dataset is designed to develop models for understanding of social situations in videos. Each sample consists of a video clip, a question, and an answer. The dataset is split into 37,191 training samples, and 5,320 validation set samples. Following the settings of Kim et al. [6], we evaluate our models on the biased Dogs and Cats dataset.
Dataset Splits Yes The train and validation set consist of 60,000 and 10,000 samples, respectively. VQA-CPv2 consist of 438,183 samples in the train set and 219,928 samples in the test set. The dataset is split into 37,191 training samples, and 5,320 validation set samples.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Functional Fisher information regularization training on TB1 and testing on TB2 with λ (see Eq. (12)) set to equal 3e-10 results in 94.71% accuracy