Partial success in closing the gap between human and machine vision
Authors: Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Tizian Thieringer, Matthias Bethge, Felix A. Wichmann, Wieland Brendel
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To answer this question, we tested human observers on a broad range of out-of-distribution (OOD) datasets, recording 85,120 psychophysical trials across 90 participants. |
| Researcher Affiliation | Academia | 1University of Tübingen 2International Max Planck Research School for Intelligent Systems |
| Pseudocode | No | The paper describes its methods verbally and mathematically but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our results give reason for cautious optimism: While there is still much room for improvement, the behavioural difference between human and machine vision is narrowing. In order to measure future progress, 17 OOD datasets with image-level human behavioural data and evaluation code are provided as a toolbox and benchmark at https://github.com/bethgelab/model-vs-human/. |
| Open Datasets | Yes | We collected human data for 17 generalisation datasets... OOD images were obtained from different sources: sketches from Image Net-Sketch [16], stylized images from Stylized-Image Net [17]... and the remaining twelve parametric datasets were adapted from [33]. and 17 OOD datasets with image-level human behavioural data and evaluation code are provided as a toolbox and benchmark at https://github.com/bethgelab/model-vs-human/. |
| Dataset Splits | No | The paper describes collecting human psychophysical data and evaluating pre-trained models on OOD datasets, but it does not specify train/validation/test splits for its own experimental setup or for the OOD datasets it collected. |
| Hardware Specification | No | The paper describes the monitor used for human psychophysical experiments ('a 22 monitor with 1920 1200 pixels resolution (refresh rate: 120 Hz)') but does not specify the hardware used to run the model evaluations or training. |
| Software Dependencies | No | The paper mentions evaluating Py Torch and Tensor Flow models and cites the PyTorch library, but does not provide specific version numbers for any software dependencies used in its experiments. |
| Experiment Setup | Yes | Stimuli were presented at the center of a 22 monitor with 1920 1200 pixels resolution (refresh rate: 120 Hz). Viewing distance was 107 cm and target images subtended 3 3 degrees of visual angle. Human observers were presented with an image and asked to select the correct category out of 16 basic categories (such as chair, dog, airplane, etc.). Stimuli were balanced w.r.t. classes and presented in random order. ... presented images for 200 ms followed by a 1/f backward mask to limit the influence of recurrent processing. |