Metamers of neural networks reveal divergence from human perceptual systems
Authors: Jenelle Feather, Alex Durango, Ray Gonzalez, Josh McDermott
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To more thoroughly investigate their similarity to biological systems, we synthesized model metamers stimuli that produce the same responses at some stage of a network s representation. We generated model metamers for natural stimuli by performing gradient descent on a noise signal, matching the responses of individual layers of image and audio networks to a natural image or speech signal. The resulting signals reflect the invariances instantiated in the network up to the matched layer. We then measured whether model metamers were recognizable to human observers a necessary condition for the model representations to replicate those of humans. |
| Researcher Affiliation | Academia | Jenelle Feather1,2,3 Alex Durango1,2,3 Ray Gonzalez1,2,3 Josh Mc Dermott1,2,3,4 1 Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology 2 Mc Govern Institute, Massachusetts Institute of Technology 3 Center for Brains Minds and Machines, Massachusetts Institute of Technology 4 Speech and Hearing Bioscience and Technology, Harvard University {jfeather,durangoa,raygon,jhm}@mit.edu |
| Pseudocode | No | The paper describes the metamer generation process in text and figures, but does not contain structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Example generation code and trained models: https://github.com/jenellefeather/model_metamers |
| Open Datasets | Yes | The auditory models were trained a word recognition task similar to [8], using segments from the Wall Street Journal [40] and Spoken Wikipedia Corpora [41]. Image Net-trained models were obtained from publicly available pretrained checkpoints2. ... across each of the 16 MS-COCO categories... |
| Dataset Splits | Yes | There were 793 word classes sourced from 432 unique speakers, with 230357 unique clips in the training set and 40651 segments in the validation set (full details of the dataset construction are in Section S1.1). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper states that models and metamer generation were implemented in 'TensorFlow [32]', but does not provide specific version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | Metamer synthesis used 15000 iterations of the Adam optimizer [33] with a learning rate of 0.001, with the exception of the VGGish Embedding (0.01) and Deep Speech (0.0001) models. |