Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
Authors: Yaxuan Huang, Xili Dai, Jianan Wang, Xianbiao Qi, Yixing Yuan, Xiangyu Yue
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Plane-DUSt3R not only outperforms state-of-the-art methods on the synthetic dataset but also proves robust and effective on in the wild data with different image styles such as cartoon. 4 EXPERIMENTS 4.1 SETTINGS. Dataset. Structured3D (Zheng et al., 2020) is a synthetic dataset that provides a large collection of photo-realistic images with detailed 3D structural annotations. |
| Researcher Affiliation | Collaboration | 1Hong Kong Center for Construction Robotics, The Hong Kong University of Science and Technology 2The Hong Kong University of Science and Technology (Guangzhou) 3Astribot 4Intellifusion Inc. 5Multimedia Lab (MMLab) and SHIAE, The Chinese University of Hong Kong |
| Pseudocode | Yes | Algorithm 1 Merge Plane Require: vertical lines, horizontal lines 1: Sort vertical Lines by x-axis value 2: Initialize clusters with the first segment. |
| Open Source Code | Yes | Our code is available at: https://github.com/justacar/Plane-DUSt3R |
| Open Datasets | Yes | Dataset. Structured3D (Zheng et al., 2020) is a synthetic dataset that provides a large collection of photo-realistic images with detailed 3D structural annotations. Table 2: Comparison with data-driven image matching approaches. Methods Real Estate10K Structured3D CAD-estate Table 6: Quantitative results with on CAD-estate dataset. We conducted an additional evaluation on the CAD-Estate dataset Rozumnyi et al. (2023). CAD Estate is derived from Real Estate10K dataset Zhou et al. (2018) |
| Dataset Splits | Yes | Dataset. Structured3D (Zheng et al., 2020) is a synthetic dataset that provides a large collection of photo-realistic images with detailed 3D structural annotations. Similar to Yang et al. (2022), the dataset is divided into training, validation, and test sets at the scene level, comprising 3000, 250, and 250 scenes, respectively. Each scene consists of multiple rooms, with each room containing 1 to 5 images captured from different viewpoints. To construct image pairs that share similar visual content, we retain only rooms with at least two images. Within each room, images are paired to form image sets. Ultimately, we obtained 115,836 image pairs for the training set and 11,030 image pairs for the test set. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details like GPU models, CPU types, or memory specifications used for running the experiments. It discusses training parameters and datasets but omits hardware. |
| Software Dependencies | No | The paper mentions the use of 'AdamW optimizer (Loshchilov & Hutter, 2017)', 'ViT encoder (Dosovitskiy et al., 2020)', 'DPT (Ranftl et al., 2021) head', and 'HRnet network (Wang et al., 2020)', which are methodologies or model architectures. However, it does not provide specific version numbers for any software libraries, programming languages, or development environments that would be needed to reproduce the experimental setup. |
| Experiment Setup | Yes | During training, we initialize the model with the original DUSt3R checkpoint. We freeze the encoder parameters and fine-tune only the decoder and DPT heads. Our data augmentation strategy follows the same approach as DUSt3R, using input resolution of 512 512. We employ the AdamW optimizer (Loshchilov & Hutter, 2017) with a cosine learning rate decay schedule, starting with a base learning rate of 1e-4 and a minimum of 1e-6. The model is trained for 20 epochs, including 2 warm-up epochs, with a batch size of 16. We train two versions Plane-DUSt3R, one with metric-scale loss and the other one without it. In our experiments, the depth consistency tolerance ϵ1 is set to 0.005. We use a threshold of τ = 15 to report RTA@15 and RRA@15 (The comprehensive results of different thresholds can be seen in Table 4 of Appendix C.1). A predicted plane is considered matched with a ground truth plane if and only if the angular difference between them is less than 10 and the offset difference is less than 0.15m. |