Flora: Dual-Frequency LOss-Compensated ReAl-Time Monocular 3D Video Reconstruction

Authors: Likang Wang, Yue Gong, Qirui Wang, Kaixuan Zhou, Lei Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our state-of-the-art experimental results on realworld datasets demonstrate Flora s leading performance in both effectiveness and efficiency. 4 Experiments We first illustrate the experimental settings in Section 4.1 and then show the results in Section 4.2.
Researcher Affiliation Collaboration Likang Wang1, Yue Gong3, Qirui Wang3, Kaixuan Zhou4,5, Lei Chen1,2 1Department of Computer Science and Engineering, The Hong Kong University of Science and Technology 2Data Science and Analytics Thrust, The Hong Kong University of Science and Technology (Guangzhou) 3Distributed and Parallel Software Lab, Huawei Technologies 4Riemann Lab, Huawei Technologies 5Fundamental Software Innovation Lab, Huawei Technologies
Pseudocode No The paper describes the proposed methods in text and uses figures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/No One UST/Flora.
Open Datasets Yes Datasets: We test our model on the most popular indoor video reconstruction dataset: Scan Net (Dai et al. 2017).
Dataset Splits Yes We split Scan Net following its standard settings. We use the F-score on the validation set to distinguish each eg s quality.
Hardware Specification Yes On Scan Net, we achieve an F-score of 58.4%, 2.2% higher than the current SOTA (Neuralrecon), while running in real-time at an FPS of 30 on one single RTX 2080 Ti GPU.
Software Dependencies Yes The model is constructed using Py Torch (2019), the sparse operations are implemented with Torch Sparse (2022), and the 2D feature extraction module is a pre-trained MNas Net (2019) model.
Experiment Setup Yes Hyper-parameter: We adopt MADGRAD (Defazio and Jelassi 2022) as the optimizer, and the learning rate is set to 10 3. Each feature volume is aggregated from nine views and is a cubic of sidelength [24, 48, 96] at three stages. Our hierarchical framework contains three layers in total to balance efficiency and effectiveness. The finest voxel size is 4cm, and the TSDF truncation distance is 12cm.