Neural encoding with visual attention
Authors: Meenakshi Khosla, Gia Ngo, Keith Jamison, Amy Kuceyeski, Mert Sabuncu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using concurrent eye-tracking and functional Magnetic Resonance Imaging (f MRI) recordings from a large cohort of human subjects watching movies, we first demonstrate that leveraging gaze information, in the form of attentional masking, can significantly improve brain response prediction accuracy in a neural encoding model. Next, we propose a novel approach to neural encoding by including a trainable soft-attention module. Using our new approach, we demonstrate that it is possible to learn visual attention policies by end-to-end learning merely on f MRI response data, and without relying on any eye-tracking. Interestingly, we find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns, despite no explicit supervision to do so. |
| Researcher Affiliation | Academia | 1 School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853 2 Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY 14853 3 Radiology, Weill Cornell Medicine, New York, NY 10065 4 Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY 10065 |
| Pseudocode | No | The paper describes the architecture and operations using equations and descriptive text (e.g., "A = exp S(i) Pn j=1 exp S(j) , i {1, .., n}."), but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code is available at https://github.com/mk2299/encoding_attention. |
| Open Datasets | Yes | We study high-resolution 7T f MRI (TR = 1s, voxel size = 1.6 mm isotropic) recordings of 158 participants from the Human Connectome Project (HCP) movie-watching database while they viewed 4 audio-visual movies in separate runs [13, 26]. |
| Dataset Splits | Yes | We train and validate our models on three movies using a 9:1 train-val split and leave the fourth movie for independent testing. This yields 2000 training, 265 validation and 699 test stimulus-response pairs. |
| Hardware Specification | No | The paper mentions "7T f MRI" and computational aspects like "computational/memory constraints" but does not specify any particular GPU or CPU models, or other hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam optimizer and ResNet-50 architecture, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All parameters were optimized to minimize the mean squared error between the predicted and target f MRI response using Adam [18] for 25 epochs with a learning rate of 1e-4. Validation curves were monitored to ensure convergence and hyperparameters were optimized on the validation set. |