Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to reason over visual objects
Authors: Shanka Subhra Mondal, Taylor Whittington Webb, Jonathan Cohen
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We found that a simple model, consisting only of an object-centric encoder and a transformer reasoning module, achieved state-of-the-art results on both of two challenging RPM-like benchmarks (PGM and I-RAVEN), as well as a novel benchmark with greater visual complexity (CLEVR-Matrices). |
| Researcher Affiliation | Academia | Shanka Subhra Mondal* Princeton University Princeton, NJ EMAIL Taylor W. Webb* University of California, Los Angeles Los Angeles, CA EMAIL Jonathan D. Cohen Princeton University Princeton, NJ EMAIL |
| Pseudocode | No | The paper describes the model and its components in detail but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code can be downloaded from https://github.com/Shanka123/STSN. |
| Open Datasets | Yes | The PGM dataset was introduced by Barrett et al. (2018)... The RAVEN dataset was introduced by Zhang et al. (2019a)... We created a novel dataset of RPM-like problems using realistically rendered 3D shapes, based on source code from CLEVR (a popular visual-question-answering dataset) (Johnson et al., 2017)... The CLEVR-Matrices dataset can be downloaded from https://dataspace.princeton.edu/handle/88435/dsp01fq977z011. |
| Dataset Splits | Yes | Each regime consists of 1.2M training problems, 20K validation problems, and 200K testing problems. (PGM)... There are a total of 42K training problems, 14K validation problems, and 14K testing problems. (I-RAVEN)... We generated 20K problems for each type, including 16K for training, 2K for validation, and 2K for testing. (CLEVR-Matrices) |
| Hardware Specification | Yes | Table 11: Hardware specifications for all datasets. I-RAVEN 1 A100, 40GB RAM; PGM-Neutral 6 A100, 40GB RAM; PGM-Interpolation 6 A100, 40GB RAM; PGM-Extrapolation 6 A100, 40GB RAM; CLEVR-Matrices 8 A100, 80GB RAM |
| Software Dependencies | No | The paper mentions 'all experiments were performed using the Pytorch library (Paszke et al., 2017)' but does not specify its version number or other software dependencies with specific version numbers. |
| Experiment Setup | Yes | We give a detailed characterization of all hyperparameters and training details for our models in Section A.2. Table 9: Hyperparameters for Transformer Reasoning Module. Table 10: Training details for all datasets. We used a reconstruction loss weight of λ = 1000 for all datasets. We used the ADAM optimizer (Kingma & Ba, 2014). |