Learning to reason over visual objects

Authors: Shanka Subhra Mondal, Taylor Whittington Webb, Jonathan Cohen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We found that a simple model, consisting only of an object-centric encoder and a transformer reasoning module, achieved state-of-the-art results on both of two challenging RPM-like benchmarks (PGM and I-RAVEN), as well as a novel benchmark with greater visual complexity (CLEVR-Matrices).
Researcher Affiliation Academia Shanka Subhra Mondal* Princeton University Princeton, NJ smondal@princeton.edu Taylor W. Webb* University of California, Los Angeles Los Angeles, CA taylor.w.webb@gmail.com Jonathan D. Cohen Princeton University Princeton, NJ jdc@princeton.edu
Pseudocode No The paper describes the model and its components in detail but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes All code can be downloaded from https://github.com/Shanka123/STSN.
Open Datasets Yes The PGM dataset was introduced by Barrett et al. (2018)... The RAVEN dataset was introduced by Zhang et al. (2019a)... We created a novel dataset of RPM-like problems using realistically rendered 3D shapes, based on source code from CLEVR (a popular visual-question-answering dataset) (Johnson et al., 2017)... The CLEVR-Matrices dataset can be downloaded from https://dataspace.princeton.edu/handle/88435/dsp01fq977z011.
Dataset Splits Yes Each regime consists of 1.2M training problems, 20K validation problems, and 200K testing problems. (PGM)... There are a total of 42K training problems, 14K validation problems, and 14K testing problems. (I-RAVEN)... We generated 20K problems for each type, including 16K for training, 2K for validation, and 2K for testing. (CLEVR-Matrices)
Hardware Specification Yes Table 11: Hardware specifications for all datasets. I-RAVEN 1 A100, 40GB RAM; PGM-Neutral 6 A100, 40GB RAM; PGM-Interpolation 6 A100, 40GB RAM; PGM-Extrapolation 6 A100, 40GB RAM; CLEVR-Matrices 8 A100, 80GB RAM
Software Dependencies No The paper mentions 'all experiments were performed using the Pytorch library (Paszke et al., 2017)' but does not specify its version number or other software dependencies with specific version numbers.
Experiment Setup Yes We give a detailed characterization of all hyperparameters and training details for our models in Section A.2. Table 9: Hyperparameters for Transformer Reasoning Module. Table 10: Training details for all datasets. We used a reconstruction loss weight of λ = 1000 for all datasets. We used the ADAM optimizer (Kingma & Ba, 2014).