Flexible Context-Driven Sensory Processing in Dynamical Vision Models

Authors: Lakshmi Narasimhan Govindarajan, Abhiram Iyer, Valmiki Kothare, Ila Fiete

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study this Dynamical Cortical network (DCnet) in a visual cue-delay-search task and show that the model uses its own cue representations to adaptively modulate its perceptual responses to solve the task, outperforming state-of-the-art DNN vision and LLM models.
Researcher Affiliation Academia Lakshmi Narasimhan Govindarajan 1, 3, 4, Abhiram Iyer 2, 3, 4, Valmiki Kothare1, and Ila Fiete1, 3, 4 1Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 2Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 3Mc Govern Institute for Brain Research, MIT, Cambridge, MA 4K. Lisa Yang Integrative Computational Neuroscience (ICo N), MIT, Cambridge, MA
Pseudocode No The paper provides mathematical equations describing the model's dynamics in Appendix A.1 but does not present them in a structured pseudocode or algorithm block format.
Open Source Code Yes Code and datasets can be found here: Project repository.
Open Datasets Yes We draw inspiration from Clevr [48], a synthetic dataset for language-mediated visual reasoning and construct vis-count, a parametric visually-cued, delayed search task.
Dataset Splits Yes In total, our training (validation) dataset comprised of 384K (38K) trials.
Hardware Specification Yes All models were trained on A100 GPUs for 100 epochs each.
Software Dependencies No The paper mentions using "Adam W optimizer" and a "one-cycle learning rate scheduler," but it does not specify software dependencies like Python, PyTorch/TensorFlow, or other libraries with their specific version numbers.
Experiment Setup Yes We used an Adam W optimizer (momentum=0.9, β1 = 0.9, β2 = 0.999), a one-cycle learning rate scheduler with a warm-up period of 30 epochs and a maximum learning rate of 4e 4. DCnet was 4 layers deep ( 1.8M learnable parameters) and was trained with batches of 256 samples.