Systematic Visual Reasoning through Object-Centric Relational Abstraction
Authors: Taylor Webb, Shanka Subhra Mondal, Jonathan D Cohen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We evaluated OCRA on two challenging visual reasoning tasks, ART [51] and SVRT [13], specifically designed to probe systematic generalization of learned abstract rules, as well as a novel dataset, CLEVR-ART, involving more complex visual inputs (Figure 2). |
| Researcher Affiliation | Academia | Taylor W. Webb* Department of Psychology University of California, Los Angeles Los Angeles, CA taylor.w.webb@gmail.com Shanka Subhra Mondal* Department of Electrical and Computer Engineering Princeton University Princeton, NJ smondal@princeton.edu Jonathan D. Cohen Princeton Neuroscience Institute Princeton University Princeton, NJ jdc@princeton.edu |
| Pseudocode | No | The paper provides schematics and descriptions of the model's components but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code, nor does it provide a link to a repository for the methodology described. |
| Open Datasets | Yes | For ART, we generated a dataset of random multi-object displays (see Figure S2), utilizing a publicly available repository of unicode character images2. (Footnote 2: https://github.com/bbvanexttechnologies/unicode-images-database) and We created a novel dataset based on ART using realistically rendered 3D shapes from CLEVR [21] |
| Dataset Splits | Yes | In the most difficult regime (m = 95), the training set consists of problems that are created using only 5 objects, and the test problems are created using the remaining 95 objects, whereas in the easiest regime (m = 0), both training and test problems are created from the complete set of 100 objects (though the arrangement of these objects is distinct, such that the training and test sets are still disjoint). |
| Hardware Specification | No | The paper mentions running experiments on the 'Princeton University Della cluster' but does not provide specific hardware details such as GPU or CPU models, memory, or processor types. |
| Software Dependencies | No | The paper mentions 'implementation was done using the Pytorch library [31]' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Images were resized to 128 128, and pixels were normalized to the range [0, 1]. For slot attention, we used K = 6 (K = 7 for CLEVR-ART) slots, T = 3 attention iterations per image, and a dimensionality of D = 64. For ART and CLEVR-ART, we used a default learning rate of 8e 5, and a batch size of 16. For SVRT, we used a learning rate of 4e 5, a batch size of 32, and trained for 2000 epochs on each task. |