reproducibilityindex.ai

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Authors: Yining Hong, Li Yi, Josh Tenenbaum, Antonio Torralba, Chuang Gan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We analyze a suite of state-of-the-art visual reasoning models on the PTR dataset and ﬁnd that they all struggle with it, especially in relational, analogical, and physical reasoning.
Researcher Affiliation	Collaboration	Yining Hong UCLA Li Yi Stanford University Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL Antonio Torralba MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab
Pseudocode	No	The paper does not include any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	PTR dataset and baseline models are publicly available 2. Project page: http://ptr.csail.mit.edu/
Open Datasets	Yes	Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR dataset and baseline models are publicly available 2. Project page: http://ptr.csail.mit.edu/
Dataset Splits	Yes	PTR includes approximately 52k images for training, 9k for validation and 10k for testing. The images are rendered via Blender3. ... PTR contains approximately 520k questions for training, 90k for validation and 100k for testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions software like 'Blender3' and 'Bullet4' but does not provide specific version numbers for these or any other ancillary software components, which is required for reproducibility.
Experiment Setup	Yes	Implementation Details We use an Image Net-pretrained Res Net-101 to extract 14 14 1024 feature maps for MAC, MAC(P) and LCGN. For CNN-LSTM, we use the 2048-dimensional feature from the last pooling layer. The setup of MDETR is the same as the original paper with Res Net-101 as backbone. We ﬁrst train only the task of part detection for 30 epochs, and then train the full PTR with question answering loss. For NS-VQA, we use Mask R-CNN [18] to generate segmentation proposals of objects and parts, respectively. The Mask R-CNN is trained on 20% of the training data annotated with ground-truth masks for 30,000 iterations. We do not include labels of categories and attributes when training segmentation. We extract the categories and attributes of objects and parts using attribute networks (Res Net-34).