Partial success in closing the gap between human and machine vision

Authors: Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Tizian Thieringer, Matthias Bethge, Felix A. Wichmann, Wieland Brendel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To answer this question, we tested human observers on a broad range of out-of-distribution (OOD) datasets, recording 85,120 psychophysical trials across 90 participants.
Researcher Affiliation Academia 1University of Tübingen 2International Max Planck Research School for Intelligent Systems
Pseudocode No The paper describes its methods verbally and mathematically but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our results give reason for cautious optimism: While there is still much room for improvement, the behavioural difference between human and machine vision is narrowing. In order to measure future progress, 17 OOD datasets with image-level human behavioural data and evaluation code are provided as a toolbox and benchmark at https://github.com/bethgelab/model-vs-human/.
Open Datasets Yes We collected human data for 17 generalisation datasets... OOD images were obtained from different sources: sketches from Image Net-Sketch [16], stylized images from Stylized-Image Net [17]... and the remaining twelve parametric datasets were adapted from [33]. and 17 OOD datasets with image-level human behavioural data and evaluation code are provided as a toolbox and benchmark at https://github.com/bethgelab/model-vs-human/.
Dataset Splits No The paper describes collecting human psychophysical data and evaluating pre-trained models on OOD datasets, but it does not specify train/validation/test splits for its own experimental setup or for the OOD datasets it collected.
Hardware Specification No The paper describes the monitor used for human psychophysical experiments ('a 22 monitor with 1920 1200 pixels resolution (refresh rate: 120 Hz)') but does not specify the hardware used to run the model evaluations or training.
Software Dependencies No The paper mentions evaluating Py Torch and Tensor Flow models and cites the PyTorch library, but does not provide specific version numbers for any software dependencies used in its experiments.
Experiment Setup Yes Stimuli were presented at the center of a 22 monitor with 1920 1200 pixels resolution (refresh rate: 120 Hz). Viewing distance was 107 cm and target images subtended 3 3 degrees of visual angle. Human observers were presented with an image and asked to select the correct category out of 16 basic categories (such as chair, dog, airplane, etc.). Stimuli were balanced w.r.t. classes and presented in random order. ... presented images for 200 ms followed by a 1/f backward mask to limit the influence of recurrent processing.