reproducibilityindex.ai

Language-biased image classification: evaluation based on semantic representations

Authors: Yoann Lemesle, Masataka Sawayama, Guillermo Valle-Perez, Maxime Adolphe, Hélène Sauzéon, Pierre-Yves Oudeyer

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The present study introduces methodological tools from the cognitive science literature to assess the biases of artificial models. Specifically, we introduce a benchmark task to test whether words superimposed on images can distort the image classification across different category levels and, if it can, whether the perturbation is due to the shared semantic representation between language and vision. Our dataset is a set of word-embedded images and consists of a mixture of natural image datasets and hierarchical word labels with superordinate/basic category levels. Using this benchmark test, we evaluate the CLIP model.
Researcher Affiliation	Collaboration	Yoann Lemesle INRIA, France ENS Rennes, France Masataka Sawayama INRIA France Guillermo Valle-Perez INRIA France Maxime Adolphe INRIA France H el ene Sauz eon INRIA, France Universit e de Bordeaux, France Pierre-Yves Oudeyer INRIA, France Microsoft Research Montreal
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	The present study provides an open-source code to reproduce the results and the full dataset (https://github.com/flowersteam/picture-word-interference).
Open Datasets	Yes	We use the two image datasets from the cognitive neuroscience literature on object recognition (Cichy et al., 2016; Mohsenzadeh et al., 2019)... For the word labels, we extracted the superordinate category words from the MS-COCO dataset and the basic category words from the MS-COCO and CIFAR-100 datasets (Lin et al., 2014; Krizhevsky et al., 2009).
Dataset Splits	No	The paper evaluates a pre-trained model and constructs a dataset for evaluation, but it does not describe training/validation/test splits for training its own model or for its constructed dataset.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using pre-trained models like CLIP and Word2Vec but does not list specific software dependencies (e.g., Python, PyTorch, TensorFlow) with version numbers.
Experiment Setup	No	The paper evaluates a pre-trained model and describes how it is used for classification (e.g., using 'a photo of a [label]' prompt), but it does not provide hyperparameters or system-level training settings, as it is not training a new model from scratch.