Metamers of neural networks reveal divergence from human perceptual systems

Authors: Jenelle Feather, Alex Durango, Ray Gonzalez, Josh McDermott

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To more thoroughly investigate their similarity to biological systems, we synthesized model metamers stimuli that produce the same responses at some stage of a network s representation. We generated model metamers for natural stimuli by performing gradient descent on a noise signal, matching the responses of individual layers of image and audio networks to a natural image or speech signal. The resulting signals reflect the invariances instantiated in the network up to the matched layer. We then measured whether model metamers were recognizable to human observers a necessary condition for the model representations to replicate those of humans.
Researcher Affiliation Academia Jenelle Feather1,2,3 Alex Durango1,2,3 Ray Gonzalez1,2,3 Josh Mc Dermott1,2,3,4 1 Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology 2 Mc Govern Institute, Massachusetts Institute of Technology 3 Center for Brains Minds and Machines, Massachusetts Institute of Technology 4 Speech and Hearing Bioscience and Technology, Harvard University {jfeather,durangoa,raygon,jhm}@mit.edu
Pseudocode No The paper describes the metamer generation process in text and figures, but does not contain structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Example generation code and trained models: https://github.com/jenellefeather/model_metamers
Open Datasets Yes The auditory models were trained a word recognition task similar to [8], using segments from the Wall Street Journal [40] and Spoken Wikipedia Corpora [41]. Image Net-trained models were obtained from publicly available pretrained checkpoints2. ... across each of the 16 MS-COCO categories...
Dataset Splits Yes There were 793 word classes sourced from 432 unique speakers, with 230357 unique clips in the training set and 40651 segments in the validation set (full details of the dataset construction are in Section S1.1).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper states that models and metamer generation were implemented in 'TensorFlow [32]', but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes Metamer synthesis used 15000 iterations of the Adam optimizer [33] with a learning rate of 0.001, with the exception of the VGGish Embedding (0.01) and Deep Speech (0.0001) models.