Learning Physical Graph Representations from Visual Scenes

Authors: Daniel Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li F. Fei-Fei, Jiajun Wu, Josh Tenenbaum, Daniel L. Yamins

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experiments and Analysis. Datasets, Baselines, and Evaluation Metrics. We compare PSGNet to recent CNN-based object discovery methods based on the quality of the self-supervised scene segmentations that they learn on three datasets.
Researcher Affiliation Academia 1Department of Psychology, Stanford University 2Department of Computer Science, Stanford University 3Wu Tsai Neurosciences Institute, Stanford University 4MIT CSAIL 5MIT Brain and Cognitive Sciences 6Neurosciences Ph.D. Program, Stanford University
Pseudocode No The paper describes procedures and architecture components but does not include any explicitly labeled pseudocode or algorithm blocks. It states 'formal definitions and implementation details can be found in the Supplement'.
Open Source Code No The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository.
Open Datasets Yes We compare PSGNet to recent CNN-based object discovery methods based on the quality of the self-supervised scene segmentations that they learn on three datasets. Primitives is a synthetic dataset... Playroom is a synthetic dataset... Gibson is a subset of the data from the Gibson1.0 environment [3]... where [3] refers to 'Iro Armeni, Sasha Sax, Amir R Zamir, and Silvio Savarese. Joint 2d-3d-semantic data for indoor scene understanding. ar Xiv:1702.01105, 2017.'
Dataset Splits No The paper mentions 'held-out validation images' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology).
Hardware Specification Yes We thank Google (TPUv2 team) and the NVIDIA corporation for generous donation of hardware resources.
Software Dependencies Yes Images in Primitives and Playroom are generated by Three DWorld (TDW), a general-purpose, multi-modal simulation platform built on Unity Engine 2019.
Experiment Setup Yes We always self-supervise QTR outputs from all PSG levels with the RGB values and the backward temporal difference magnitudes of the PSGNet s input movie, using the standard L2 loss. We also self-supervise a set of QSR outputs from the top PSG level on the bottom-up scene segmentations SL... this uses a softmax cross-entropy loss... Finally, except where indicated, we supervise QTR renderings on actual depth and surface normal vector images provided by the training datasets...