Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

Authors: Vijay Veerabadran, Srinivas Ravishankar, Yuan Tang, Ritik Raina, Virginia de Sa

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: Path Finder and Mazes. We combine convolutional recurrent neural networks (Conv RNNs) with a learnable halting mechanism based on (Graves, 2016). We show that 1) Ad RNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time. Our study provides modeling evidence supporting the hypothesis that recurrent processing enables the functional advantage of adaptively allocating compute resources conditional on input requirements and hence allowing generalization to harder difficulty levels of a visual reasoning problem without training.
Researcher Affiliation Academia Vijay Veerabadran1 Srinivas Ravishankar1 Yuan Tang1 Ritik Raina1 Virginia R. de Sa1,2 1 Department of Cognitive Science, 2 Halıcıo glu Data Science Institute, University of California, San Diego, La Jolla, CA 92093 vveeraba@ucsd.edu
Pseudocode No The paper includes mathematical equations describing the model's dynamics but does not present pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We investigate the effectiveness of Ad RNNs on two challenging, publically available, visual reasoning tasks based on curve tracing and route segmentation, namely Path Finder (introduced by Linsley et al. (2018)) and Mazes (introduced by Schwarzschild et al. (2021)).
Dataset Splits No We analyzed the number of steps chosen by the model before halting for each example from the validation sets of Path Finder and Mazes; with varying contour lengths and maze sizes respectively. (Section 5.1). While a validation set is mentioned, its specific size or split ratio is not provided, only that it exists.
Hardware Specification Yes All models were trained on NVIDIA RTX A6000 GPUs and implemented using Py Torch.(Paszke et al., 2017).
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework, but it does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes On Mazes training minibatch size is set to 64 images (and inference batch size of 50 images) and a learning rate schedule starting with warmup followed by step learning rate decay as indicated in Schwarzschild et al. (2021) for 50 total epochs of training. On Path Finder, we set the training minibatch size to 256 images and a constant learning rate of 1e-4 for all models for a total of 20 epochs of training. The input convolutional layer s kernel size is 7 7. Number of channels used by the model remains unchanged across layers and is determined per-model for matching overall number of trainable parameters across models. For Path Finder, we fix the number of channels to be 32 for Loc RNN, 21 for h Conv GRU and Conv GRU, and 64 for Res Net-30 with a filter size of 9x9 in the intermediate recurrent layers. For Mazes, we fix the number of kernels (d) to be 128 for Loc RNN, Conv GRU, h Conv GRU, and 100 for Res Net-30 & R-Res Net-30 and the kernel size is fixed to be 5x5.