Neural Attentive Circuits
Authors: Martin Weiss, Nasim Rahaman, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Erran Li Li, Nicolas Ballas
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the Natural Language and Visual Reasoning for Real (NLVR2) dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and Caltech-UCSD Birds dataset (CUB) by about 10 percent, and OOD robustness on Tiny Image Net-R by about 2.5 percent. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3 percent performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and textclassification from ASCII bytes, thereby confirming its general purpose nature. |
| Researcher Affiliation | Collaboration | 1Mila, Quebec AI Institute 2 Max Planck Institute for Intelligent Systems, Tübingen 3 AWS AI 4 Meta AI 5 Université de Montréal 6 Polytechnique Montréal 7 Canada CIFAR AI Chair |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulations but does not include explicit pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the Natural Language and Visual Reasoning for Real (NLVR2) dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and Caltech-UCSD Birds dataset (CUB) by about 10 percent, and OOD robustness on Tiny Image Net-R by about 2.5 percent. Further, we find that NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and textclassification from ASCII bytes, thereby confirming its general purpose nature. |
| Dataset Splits | No | The paper mentions using validation sets (e.g., for Tiny-Image Net and NLVR2) and few-shot samples, but does not provide specific percentages, counts, or explicit splitting methodology (e.g., random seed) to reproduce the dataset splits for all experiments. |
| Hardware Specification | No | The paper mentions conducting experiments, including that NACs can support "more than a thousand modules on a single GPU", but does not specify the model or type of GPU, CPU, or any other specific hardware used. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | We pretrain all models on Tiny-Image Net for 400 epochs, and perform model selection based on accuracy on the corresponding validation set. For Image Net training, we use the same convolutional preprocessors as we do for Tiny-Image Net. ... train a 8-layer deep NAC with a scale-free prior graph on full Image Net for 110 epochs, which yields 77% validation accuracy with 1024 modules. ... In practice, we set τ to a small but non-zero value (e.g. 0.5) to allow for exploration. ... we generally initialize α = 0.1. |