Panoptic 3D Scene Reconstruction From a Single RGB Image

Authors: Manuel Dahnert, Ji Hou, Matthias Niessner, Angela Dai

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches. Table 1 shows a comparison to these baselines on synthetic 3D-Front [14] data.
Researcher Affiliation Academia Manuel Dahnert Ji Hou Matthias Nießner Angela Dai Technical University of Munich
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code can be found at https://github.com/xheon/panoptic-reconstruction.
Open Datasets Yes To train and evaluate the task of panoptic 3D scene reconstruction, we consider both synthetic and real-world datasets with dense semantic and instance annotations. 3D-Front [14] is a synthetic 3D dataset... Matterport3D [2] contains reconstructed RGB-D scans...
Dataset Splits Yes We use a train/val/test split of 4,389/489/1,206 scenes... This results in 96,252/11,204/26,933 train/val/test images. We use the official train/val/test split of 61/11/18 scenes... This results in 34,737/4,898/8,631 train/val/test images.
Hardware Specification Yes We train our approach for panoptic 3D scene reconstruction on a single RTX 2080 Ti.
Software Dependencies No The paper mentions architectural components (e.g., 'Res Net-18', 'Mask R-CNN', 'UNet-style architecture') and an optimizer ('ADAM'), but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We first jointly pretrain the 2D encoder... with an ADAM optimizer using a batch size of 8 and learning rate 1e-4 for 500k iterations... The learning rate is decreased by a factor of 10 after 250k and 350k iterations. We then train the 3D sparse generative panoptic reconstruction in coarse-to-fine fashion with a batch size of 1, with the hierarchy levels trained for 10k, 5k and 5k each iterations before the next hierarchy level is added to the training. The full hierarchy is then trained for another 300k iterations.