Learning to Exploit Stability for 3D Scene Parsing
Authors: Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our framework across several photosynthetic and realistic domains: human-designed room layouts from SUNCG [Song et al., 2017], photo-realistically rendered automatically generated room layouts from Scene Net-RGBD [Mc Cormac et al., 2017], and real scenes from SUNRGBD [Song et al., 2015]. We validate that our framework makes use of unlabeled data to increase reconstruction performance and demonstrate that with physics supervision, we require fewer annotations to achieve the same performance as a fully-supervised framework. |
| Researcher Affiliation | Academia | Yilun Du MIT CSAIL Zhijian Liu MIT CSAIL Hector Basevi University of Birmingham Aleš Leonardis University of Birmingham William T. Freeman MIT CSAIL Joshua B. Tenenbaum MIT CSAIL Jiajun Wu MIT CSAIL |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for their method. |
| Open Datasets | Yes | We evaluate our framework across several photosynthetic and realistic domains: human-designed room layouts from SUNCG [Song et al., 2017], photo-realistically rendered automatically generated room layouts from Scene Net-RGBD [Mc Cormac et al., 2017], and real scenes from SUNRGBD [Song et al., 2015]. |
| Dataset Splits | Yes | We use splits of the SUNCG dataset in Tulsiani et al. [2018] with around 400,000 training images, 50,000 validation images and 100,000 test images. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Bullet [Coumans, 2010]" as a physics simulator but does not provide a specific version number. Other mentioned software like ResNet-18, ResNet-50, Faster R-CNN, Mask R-CNN refer to models/architectures, not software dependencies with version numbers for reproducibility. |
| Experiment Setup | Yes | Our training protocol consists of three different steps. We first train both our primitive prediction module and layout prediction module using existing labeled data. Second, we train both modules with the addition of physical stability module. We find that adding the physical stability module before pretraining leads to slow training, possibly due to there being many possible stable positions that are far away from ground truth. Third, we finetune using our remaining semi-supervised data without 3D annotations (so in the case of 1% data use, we use 99% of data without 3D annotations), containing only color images and ground truth bounding box annotations, to train our model using the physics stability module by using alternate batches of supervised and semi-supervised data. |