Language-biased image classification: evaluation based on semantic representations
Authors: Yoann Lemesle, Masataka Sawayama, Guillermo Valle-Perez, Maxime Adolphe, Hélène Sauzéon, Pierre-Yves Oudeyer
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The present study introduces methodological tools from the cognitive science literature to assess the biases of artificial models. Specifically, we introduce a benchmark task to test whether words superimposed on images can distort the image classification across different category levels and, if it can, whether the perturbation is due to the shared semantic representation between language and vision. Our dataset is a set of word-embedded images and consists of a mixture of natural image datasets and hierarchical word labels with superordinate/basic category levels. Using this benchmark test, we evaluate the CLIP model. |
| Researcher Affiliation | Collaboration | Yoann Lemesle INRIA, France ENS Rennes, France Masataka Sawayama INRIA France Guillermo Valle-Perez INRIA France Maxime Adolphe INRIA France H el ene Sauz eon INRIA, France Universit e de Bordeaux, France Pierre-Yves Oudeyer INRIA, France Microsoft Research Montreal |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The present study provides an open-source code to reproduce the results and the full dataset (https://github.com/flowersteam/picture-word-interference). |
| Open Datasets | Yes | We use the two image datasets from the cognitive neuroscience literature on object recognition (Cichy et al., 2016; Mohsenzadeh et al., 2019)... For the word labels, we extracted the superordinate category words from the MS-COCO dataset and the basic category words from the MS-COCO and CIFAR-100 datasets (Lin et al., 2014; Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper evaluates a pre-trained model and constructs a dataset for evaluation, but it does not describe training/validation/test splits for training its own model or for its constructed dataset. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using pre-trained models like CLIP and Word2Vec but does not list specific software dependencies (e.g., Python, PyTorch, TensorFlow) with version numbers. |
| Experiment Setup | No | The paper evaluates a pre-trained model and describes how it is used for classification (e.g., using 'a photo of a [label]' prompt), but it does not provide hyperparameters or system-level training settings, as it is not training a new model from scratch. |