Learning to reason over visual objects
Authors: Shanka Subhra Mondal, Taylor Whittington Webb, Jonathan Cohen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We found that a simple model, consisting only of an object-centric encoder and a transformer reasoning module, achieved state-of-the-art results on both of two challenging RPM-like benchmarks (PGM and I-RAVEN), as well as a novel benchmark with greater visual complexity (CLEVR-Matrices). |
| Researcher Affiliation | Academia | Shanka Subhra Mondal* Princeton University Princeton, NJ smondal@princeton.edu Taylor W. Webb* University of California, Los Angeles Los Angeles, CA taylor.w.webb@gmail.com Jonathan D. Cohen Princeton University Princeton, NJ jdc@princeton.edu |
| Pseudocode | No | The paper describes the model and its components in detail but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code can be downloaded from https://github.com/Shanka123/STSN. |
| Open Datasets | Yes | The PGM dataset was introduced by Barrett et al. (2018)... The RAVEN dataset was introduced by Zhang et al. (2019a)... We created a novel dataset of RPM-like problems using realistically rendered 3D shapes, based on source code from CLEVR (a popular visual-question-answering dataset) (Johnson et al., 2017)... The CLEVR-Matrices dataset can be downloaded from https://dataspace.princeton.edu/handle/88435/dsp01fq977z011. |
| Dataset Splits | Yes | Each regime consists of 1.2M training problems, 20K validation problems, and 200K testing problems. (PGM)... There are a total of 42K training problems, 14K validation problems, and 14K testing problems. (I-RAVEN)... We generated 20K problems for each type, including 16K for training, 2K for validation, and 2K for testing. (CLEVR-Matrices) |
| Hardware Specification | Yes | Table 11: Hardware specifications for all datasets. I-RAVEN 1 A100, 40GB RAM; PGM-Neutral 6 A100, 40GB RAM; PGM-Interpolation 6 A100, 40GB RAM; PGM-Extrapolation 6 A100, 40GB RAM; CLEVR-Matrices 8 A100, 80GB RAM |
| Software Dependencies | No | The paper mentions 'all experiments were performed using the Pytorch library (Paszke et al., 2017)' but does not specify its version number or other software dependencies with specific version numbers. |
| Experiment Setup | Yes | We give a detailed characterization of all hyperparameters and training details for our models in Section A.2. Table 9: Hyperparameters for Transformer Reasoning Module. Table 10: Training details for all datasets. We used a reconstruction loss weight of λ = 1000 for all datasets. We used the ADAM optimizer (Kingma & Ba, 2014). |