Slot Abstractors: Toward Scalable Abstract Visual Reasoning

Authors: Shanka Subhra Mondal, Jonathan D. Cohen, Taylor Whittington Webb

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the Slot Abstractor on five abstract visual reasoning tasks (including a task involving real-world images), with a diverse range of visual and rule complexity. We find that the Slot Abstractor is capable of strong systematic generalization of learned abstract rules, and can be scaled to problems with multiple rule types and more than 100 objects, significantly improving over a number of competitive baselines in most settings.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, US 2Princeton Neuroscience Institute, Princeton University, Princeton, NJ, US 3Department of Psychology, University of California Los Angeles, Los Angeles, CA, US.
Pseudocode Yes Figure 1 shows a schematic description of our approach (described in detail in Algorithm 1). ... Algorithm 1 Slot Abstractors: The inputs are the feature map obtained by passing an image x through a convolutional encoder, a position code (encoding the four cardinal directions) which is passed through a linear layer to generate positional embeddings, and K slots which are initialized from shared mean and variance parameters.
Open Source Code Yes The code is available at https://github.com/Shanka123/Slot-Abstractor.
Open Datasets Yes We evaluated the Slot Abstractor on five challenging abstract visual reasoning datasets, ART (Webb et al., 2021), SVRT (Fleuret et al., 2011), CLEVR-ART (Webb et al., 2023b), PGM (Barrett et al., 2018), and V-PROM (Teney et al., 2020).
Dataset Splits Yes Each regime consists of 1.2M training problems, 20K validation problems, and 200K testing problems. ... In this work, we focused only on the Neutral regime, with around 139K training problems, 8K validation problems, and 73K test problems.
Hardware Specification Yes We used a single A100 GPU with 80GB memory for training on ART, SVRT, and CLEVR-ART tasks. For training on PGM regimes, we used 4 A100 GPUs with 80GB memory each...
Software Dependencies No For optimization, we used the ADAM (Kingma & Ba, 2014) optimizer, and for implementation, we used the Py Torch library (Paszke et al., 2017). (No version specified for PyTorch).
Experiment Setup Yes The hyperparameters for the Abstractor module of the Slot Abstractor are described in Table 14. ... For ART and CLEVR-ART tasks, we used a learning rate of 8e 5, batch size of 16, and the number of training epochs are described in Table 15. For SVRT, we used a learning rate of 4e 5, a batch size of 32, and trained for 2000 epochs on each task.