Characterizing the Ventral Visual Stream with Response-Optimized Neural Encoding Models
Authors: Meenakshi Khosla, Keith Jamison, Amy Kuceyeski, Mert Sabuncu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the strong generalization abilities of these models on artificial stimuli and novel datasets. Intriguingly, we find that response-optimized models trained towards the ventral-occipital and lateral-occipital areas, but not early visual areas, can recapitulate complex visual behaviors like object categorization and perceived image-similarity in humans. We further probe the trained networks to reveal representational biases in different visual areas and generate experimentally testable hypotheses. |
| Researcher Affiliation | Academia | 1 Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139 2 Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY 14853 3 Radiology, Weill Cornell Medicine, New York, NY 10065 4 Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY 10065 |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Yes, the code is included in the Supplementary Material. |
| Open Datasets | Yes | Natural Scenes Dataset A detailed description of the Natural Scenes Dataset (NSD 1) is provided elsewhere [33] (see also the Appendix). The dataset contains measurements of f MRI responses from 8 participants who each viewed 9,000 10,000 natural scenes. A special set of 1,000 images were 1http://naturalscenesdataset.org shared across subjects; the remaining were mutually exclusive. |
| Dataset Splits | Yes | We used 4 NSD subjects for training and reserved the other 4 subjects for studying generalization. The first group collectively saw 37,000 natural scene images, including the 1,000 shared images. We used the 1,000 shared images for testing our models and split the remaining stimulus set into 35,000 training and 2,000 validation images. |
| Hardware Specification | Yes | All models were trained, validated and analysed on an NVIDIA Titan Xp GPU. |
| Software Dependencies | No | The paper mentions software components like CNNs and specific network architectures (e.g., Alex Net, Res Net-50) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | All parameters of the neural encoding model were optimized jointly to minimize the masked mean squared error between the predicted and measured f MRI response. The regularization parameter was optimized independently for each subject, each layer, each model and for voxels in each visual area by testing among 8 log-spaced values in [1e-4, 1e4]. The shared CNN feature extractor consists of four convolutional blocks, with each block comprising the following feedforward computations: two 3 3 convolutional layers, each followed by an inner batch norm, and a Re LU; and an anti-aliased Avg Pool operation (stride = 2) at the end. |