TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

Authors: Aljaz Bozic, Pablo Palafox, Justus Thies, Angela Dai, Matthias Niessner

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments Metrics. To evaluate our monocular scene reconstruction, we use several measures of reconstruction performance. Table 1: Quantitative comparison with baselines and ablations on test set of Scannet dataset [8].
Researcher Affiliation Academia 1Technical University of Munich 2Max Planck Institute for Intelligent Systems, Tübingen, Germany
Pseudocode No The paper describes the method in text and diagrams (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper includes a personal project URL (aljazbozic.github.io/transformerfusion) but does not contain an explicit statement about the release of source code for the methodology or a direct link to a code repository.
Open Datasets Yes To train our approach we use Scan Net dataset [8], an RGB-D dataset of indoor apartments.
Dataset Splits Yes We follow the established train-val-test split.
Hardware Specification Yes Training takes about 30 hours using an Intel Xeon 6242R Processor and an Nvidia RTX 3090 GPU.
Software Dependencies No The paper mentions 'Py Torch library [31]' but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes During training, a batch size of 4 chunks is used with an Adam [23] optimizer with β1 = 0.9, β2 = 0.999, ϵ = 10 8 and weight regularization of 10 4. We use a learning rate of 10 4 with 5k warm-up steps at initialization, and square root learning rate decay afterwards. When computing the losses of coarse and fine surface filtering predictions, a higher weight of 2.0 is applied to near-surface voxels, to increase recall and improve overall robustness.