Recurrent neural network dynamical systems for biological vision
Authors: Wayne Soo, Aldo Battista, Puria Radmard, Xiao-Jing Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first simulate the networks for 200 ms (time steps 0 to 100) without any input so that they converge to some steady state spontaneous activity (spontaneous). A batch of images is then presented for 200 ms (time steps 100 to 200). During this time, the cross-entropy loss is computed for the last 60 ms (time steps 170 to 200) and combined using a log-weighted sum. Finally, the networks are simulated for another 200 ms (time steps 200 to 300), and the mean-squared error between activity in the last 20 ms (time steps 290 to 300) and spontaneous activity is added to the loss. This additional term encourages the networks to return to spontaneous activity after stimulus presentation. This loss function has been carefully designed to produce a mono-stable solution so that the network may perform accurate inference indefinitely across time, which we will elaborate on in the next section. We also provide a detailed ablation study of every coefficient and every term in Appendix D. Simulating these networks for 300 steps across time is computationally expensive. For comparison, we compute the total multiply-accumulate operations (MACs) for 79 well-known CNN models found in torchvision.models library and compare them against their parameter counts (Figure 2C). Our largest model, Cords Net-R8, has approximately the same number of parameters as Res Net-18, the smallest model of the Res Net series. In contrast, our smallest model, Cords Net-R2, requires more MACs to simulate than Vi T-H-14, the largest vision transformer in the model library currently. While training the models by computing and minimizing the loss function in equation (4) is inevitable, we can reduce the number of training iterations needed by carefully initializing our models. We do this in three computationally cheaper steps (Figure 2B). We first train a feedforward CNN model without the recurrent component and replace it with a one-time convolutional layer. We also include batch normalization here to improve training efficiency. |
| Researcher Affiliation | Academia | Wayne W.M. Soo Department of Engineering University of Cambridge wmws2@cam.ac.uk Aldo Battista Center for Neural Science New York University aldo.battista@nyu.edu Puria Radmard Department of Engineering University of Cambridge pr450@cam.ac.uk Xiao-Jing Wang Center for Neural Science New York University xjwang@nyu.edu |
| Pseudocode | No | The paper provides equations and describes algorithms in text, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Yes, the link to our code is provided in the supplementary. |
| Open Datasets | Yes | We train our models on MNIST [70], Fashion-MNIST [71], CIFAR-10 [72], CIFAR100 and Image Net [73], each with dataset-specific augmentations [74] as detailed in Appendix C.2. |
| Dataset Splits | No | The paper states 'Test accuracies of all experiments can be found in Table 1 (for Image Net, the validation accuracy is shown instead).' and references Figure 2D which shows validation accuracy over epochs. However, it does not explicitly provide the split percentages or absolute counts for training, validation, and test sets. It implies standard splits are used by referencing common datasets, but does not explicitly state them. |
| Hardware Specification | No | The paper mentions 'advances in computational power' and that 'Simulating these networks for 300 steps across time is computationally expensive.' and provides 'time benchmarks in Appendix C.' but it does not specify any particular GPU, CPU, or other hardware model numbers used for experiments. The Appendix C.3 says 'Experiments were performed on a cluster running Ubuntu 20.04' and mentions 'NVIDIA GeForce RTX 3090 GPU' for timing but not for all experiments. |
| Software Dependencies | No | The paper references 'torchvision.models library' and uses 'Python' for its toolkit, but it does not provide specific version numbers for these software components or any other key libraries. |
| Experiment Setup | Yes | We train networks of four different sizes, named Cords Net-RX, where X [2, 4, 6, 8] represents the number of recurrent layers. Exact model specifications can be found in Appendix C.1, where we also review important design choices. We train our models on MNIST [70], Fashion-MNIST [71], CIFAR-10 [72], CIFAR100 and Image Net [73], each with dataset-specific augmentations [74] as detailed in Appendix C.2. The neuron time constant is set to 10 ms (constant for all neurons), and the network is simulated with 2 ms time steps. The loss function that we aim to minimize is computed as: loss = logspace(-3,0,steps=30) * CEloss(output[170:200],labels) + 1e-3 * MSEloss(activity[290:300],spontaneous) (4) |