Learning Physical Graph Representations from Visual Scenes
Authors: Daniel Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li F. Fei-Fei, Jiajun Wu, Josh Tenenbaum, Daniel L. Yamins
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments and Analysis. Datasets, Baselines, and Evaluation Metrics. We compare PSGNet to recent CNN-based object discovery methods based on the quality of the self-supervised scene segmentations that they learn on three datasets. |
| Researcher Affiliation | Academia | 1Department of Psychology, Stanford University 2Department of Computer Science, Stanford University 3Wu Tsai Neurosciences Institute, Stanford University 4MIT CSAIL 5MIT Brain and Cognitive Sciences 6Neurosciences Ph.D. Program, Stanford University |
| Pseudocode | No | The paper describes procedures and architecture components but does not include any explicitly labeled pseudocode or algorithm blocks. It states 'formal deļ¬nitions and implementation details can be found in the Supplement'. |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository. |
| Open Datasets | Yes | We compare PSGNet to recent CNN-based object discovery methods based on the quality of the self-supervised scene segmentations that they learn on three datasets. Primitives is a synthetic dataset... Playroom is a synthetic dataset... Gibson is a subset of the data from the Gibson1.0 environment [3]... where [3] refers to 'Iro Armeni, Sasha Sax, Amir R Zamir, and Silvio Savarese. Joint 2d-3d-semantic data for indoor scene understanding. ar Xiv:1702.01105, 2017.' |
| Dataset Splits | No | The paper mentions 'held-out validation images' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology). |
| Hardware Specification | Yes | We thank Google (TPUv2 team) and the NVIDIA corporation for generous donation of hardware resources. |
| Software Dependencies | Yes | Images in Primitives and Playroom are generated by Three DWorld (TDW), a general-purpose, multi-modal simulation platform built on Unity Engine 2019. |
| Experiment Setup | Yes | We always self-supervise QTR outputs from all PSG levels with the RGB values and the backward temporal difference magnitudes of the PSGNet s input movie, using the standard L2 loss. We also self-supervise a set of QSR outputs from the top PSG level on the bottom-up scene segmentations SL... this uses a softmax cross-entropy loss... Finally, except where indicated, we supervise QTR renderings on actual depth and surface normal vector images provided by the training datasets... |