Systematic Visual Reasoning through Object-Centric Relational Abstraction

Authors: Taylor Webb, Shanka Subhra Mondal, Jonathan D Cohen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We evaluated OCRA on two challenging visual reasoning tasks, ART [51] and SVRT [13], specifically designed to probe systematic generalization of learned abstract rules, as well as a novel dataset, CLEVR-ART, involving more complex visual inputs (Figure 2).
Researcher Affiliation Academia Taylor W. Webb* Department of Psychology University of California, Los Angeles Los Angeles, CA taylor.w.webb@gmail.com Shanka Subhra Mondal* Department of Electrical and Computer Engineering Princeton University Princeton, NJ smondal@princeton.edu Jonathan D. Cohen Princeton Neuroscience Institute Princeton University Princeton, NJ jdc@princeton.edu
Pseudocode No The paper provides schematics and descriptions of the model's components but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of its source code, nor does it provide a link to a repository for the methodology described.
Open Datasets Yes For ART, we generated a dataset of random multi-object displays (see Figure S2), utilizing a publicly available repository of unicode character images2. (Footnote 2: https://github.com/bbvanexttechnologies/unicode-images-database) and We created a novel dataset based on ART using realistically rendered 3D shapes from CLEVR [21]
Dataset Splits Yes In the most difficult regime (m = 95), the training set consists of problems that are created using only 5 objects, and the test problems are created using the remaining 95 objects, whereas in the easiest regime (m = 0), both training and test problems are created from the complete set of 100 objects (though the arrangement of these objects is distinct, such that the training and test sets are still disjoint).
Hardware Specification No The paper mentions running experiments on the 'Princeton University Della cluster' but does not provide specific hardware details such as GPU or CPU models, memory, or processor types.
Software Dependencies No The paper mentions 'implementation was done using the Pytorch library [31]' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Images were resized to 128 128, and pixels were normalized to the range [0, 1]. For slot attention, we used K = 6 (K = 7 for CLEVR-ART) slots, T = 3 attention iterations per image, and a dimensionality of D = 64. For ART and CLEVR-ART, we used a default learning rate of 8e 5, and a batch size of 16. For SVRT, we used a learning rate of 4e 5, a batch size of 32, and trained for 2000 epochs on each task.