Coherent 3D Scene Diffusion From a Single RGB Image
Authors: Manuel Dahnert, Angela Dai, Norman Müller, Matthias Niessner
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following sections, we will demonstrate the advantages of our method and contributions by evaluating it against common 3D scene reconstruction benchmarks. (...) In Fig. 3, we present qualitative comparisons of our approach against state-of-the-art methods for single-view 3D scene reconstruction. (...) In Tab. 1, we quantitatively compare the single-view shape reconstruction performance of our approach against baseline methods on the Pix3D dataset. (...) We conduct a series of detailed ablation studies to verify the effectiveness of our design decisions and contributions. The quantitative results are provided in Tab. 2. |
| Researcher Affiliation | Collaboration | Manuel Dahnert1 Angela Dai1 Norman Müller2 Matthias Nießner1 1Technical University of Munich, Germany 2Meta Reality Labs Zurich, Switzerland |
| Pseudocode | No | The paper describes its methods in text and figures but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | Justification: We will release the code after cleaning up and documenting it for easy usage. |
| Open Datasets | Yes | Following [23, 48, 77], we train and evaluate the performance of our 3D pose estimation on the SUN RGB-D [62] dataset with the official splits. (...) We train and evaluate the performance of our 3D shape reconstruction on the Pix3D [64] dataset, which contains images of common furniture objects with pixel-aligned 3D shapes from 9 object classes, comprising 10,046 images. |
| Dataset Splits | No | Following [23, 48, 77], we train and evaluate the performance of our 3D pose estimation on the SUN RGB-D [62] dataset with the official splits. (...) We use the train and test splits defined in [37], ensuring that 3D models between the respective splits do not overlap. |
| Hardware Specification | Yes | We train our models on a single RTX3090 with 24GB VRAM for 1000 epochs on Pix3D, for 500 epochs on SUN RGB-D and for 50 epochs of additional joint training using Lalign. |
| Software Dependencies | No | We implement our model in Py Torch[50] and use the Adam W [41] optimizer with a learning rate of 1 10 4 and β1 = 0.9, β2 = 0.999. We utilize an off-the-shelf 2D instance segmentation model, Mask2Former [5], which is pre-trained on COCO [36] using a Swin Transformer [39] backbone. |
| Experiment Setup | Yes | For all diffusion training processes, we uniformly sample time steps t = 1, ...T, T = 1000, and use a linear variance schedule with β1 = 0.0001 and βT = 0.02. We implement our model in Py Torch[50] and use the Adam W [41] optimizer with a learning rate of 1 10 4 and β1 = 0.9, β2 = 0.999. We train our models on a single RTX3090 with 24GB VRAM for 1000 epochs on Pix3D, for 500 epochs on SUN RGB-D and for 50 epochs of additional joint training using Lalign. During inference, we employ DDIM [61] with 100 steps to accelerate sampling speed. For classifierfree guidance [21], we drop the condition y with probability p = 0.8. |