Panoptic 3D Scene Reconstruction From a Single RGB Image
Authors: Manuel Dahnert, Ji Hou, Matthias Niessner, Angela Dai
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches. Table 1 shows a comparison to these baselines on synthetic 3D-Front [14] data. |
| Researcher Affiliation | Academia | Manuel Dahnert Ji Hou Matthias Nießner Angela Dai Technical University of Munich |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code can be found at https://github.com/xheon/panoptic-reconstruction. |
| Open Datasets | Yes | To train and evaluate the task of panoptic 3D scene reconstruction, we consider both synthetic and real-world datasets with dense semantic and instance annotations. 3D-Front [14] is a synthetic 3D dataset... Matterport3D [2] contains reconstructed RGB-D scans... |
| Dataset Splits | Yes | We use a train/val/test split of 4,389/489/1,206 scenes... This results in 96,252/11,204/26,933 train/val/test images. We use the official train/val/test split of 61/11/18 scenes... This results in 34,737/4,898/8,631 train/val/test images. |
| Hardware Specification | Yes | We train our approach for panoptic 3D scene reconstruction on a single RTX 2080 Ti. |
| Software Dependencies | No | The paper mentions architectural components (e.g., 'Res Net-18', 'Mask R-CNN', 'UNet-style architecture') and an optimizer ('ADAM'), but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We first jointly pretrain the 2D encoder... with an ADAM optimizer using a batch size of 8 and learning rate 1e-4 for 500k iterations... The learning rate is decreased by a factor of 10 after 250k and 350k iterations. We then train the 3D sparse generative panoptic reconstruction in coarse-to-fine fashion with a batch size of 1, with the hierarchy levels trained for 10k, 5k and 5k each iterations before the next hierarchy level is added to the training. The full hierarchy is then trained for another 300k iterations. |