reproducibilityindex.ai

Perceptual Score: What Data Modalities Does Your Model Perceive?

Authors: Itai Gat, Idan Schwartz, Alex Schwing

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using the perceptual score, we ﬁnd a surprisingly consistent trend across four popular datasets: recent, more accurate state-of-the-art multi-modal models for visual question-answering or visual dialog tend to perceive the visual data less than their predecessors. This trend is concerning as answers are hence increasingly inferred from textual cues only. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions. We hope to spur a discussion on the perceptiveness of multi-modal models and also hope to encourage the community working on multi-modal classiﬁers to start quantifying perceptiveness via the proposed perceptual score.
Researcher Affiliation	Collaboration	Itai Gat Technion Idan Schwartz Technion Net App Alexander Schwing University of Illinois at Urbana-Champaign
Pseudocode	No	The paper describes the calculation of the perceptual score with mathematical formulas and illustrative figures, but it does not include a dedicated pseudocode or algorithm block.
Open Source Code	Yes	For all the models, we used the ofﬁcial implementations. For more details please see our implementation.1 https://github.com/itaigat/perceptual-score
Open Datasets	Yes	We use the VQAv2 dataset [20], which contains 443,757 image-question pairs in the train set and 214,354 in the validation set. We also assess the perceptiveness of models trained on Visual Question Answering: Changing Priors (VQA-CP) data [21]... Social IQ [28] proposes an unconstrained benchmark... We show our results on the Vis Dial v1.0 dataset [58]... We use the au Diov ISual Crowd c Ounting (DISCO) dataset [61]...
Dataset Splits	Yes	We use the VQAv2 dataset [20], which contains 443,757 image-question pairs in the train set and 214,354 in the validation set. ... The new split consists of 438,183 training samples, and 219,928 samples for validation. ... The dataset is split into 37,191 training samples, and 5,320 validation set samples. ... 123,287 images are used for training, 2,000 images for validation, and 8,000 images for testing [58].
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or cloud computing specifications used for running the experiments.
Software Dependencies	No	The paper mentions using the 'NLTK package [57]' but does not provide specific version numbers for NLTK or any other software dependencies, which are required for reproducibility.
Experiment Setup	Yes	Experimental setup: We compute the perceptual score based on ﬁve permutations per sample. We calculate the perceptual score ﬁve times with different permutations and report the mean score along with the standard deviations. We ﬁnd the expectation to converge quickly and to be stable. For all the models, we used the ofﬁcial implementations.