DiViNeT: 3D Reconstruction from Disparate Views using Neural Template Regularization

Authors: Aditya Vora, Akshay Gadi Patil, Hao Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on two real-world object-scenes datasets, viz., DTU [16] and Blended MVS [53], to show the efficiency of our method over existing approaches on the surface reconstruction task, for both sparse and dense view input settings. Through ablation studies, we validate the design choices of our network in the context of different optimization constraints employed during training.
Researcher Affiliation Collaboration Aditya Vora1 Akshay Gadi Patil1 Hao Zhang1,2 1Simon Fraser University 2Amazon
Pseudocode No The paper describes the method and its components in text and diagrams but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statement about open-source code release, nor does it provide a link to a code repository.
Open Datasets Yes We use two commonly employed real-world object-scenes datasets, viz., DTU [16] and Blended MVS dataset [53].
Dataset Splits No While the paper mentions 'training split' and 'test scans' for DTU, it does not provide specific numerical details (percentages, counts) for training, validation, and test splits, nor does it reference predefined validation splits.
Hardware Specification Yes We train the network for 300k iterations which takes roughly 8 hours on NVIDIA RTX 3090Ti GPU.
Software Dependencies No Our implementation is based on the Py Torch framework [38].
Experiment Setup Yes For both stages, we use the Adam optimizer [20] to train our networks and set a learning rate of 5e−4. For stage-1, we set λcd, λc, λr and λv to 1.0, 0.1, 0.1, and 1.0, respectively. And for stage-2, we set λ1, λ2 and λ3 to 1.0, 0.8 and 0.8, respectively. As the templates are not perfect, we follow [58] and use an exponentially decaying loss weight for both SDF and depth regularization for the first 25k iterations of optimization. During training in both stages, we sample a batch of 512 rays in each iteration. During both stages, we assume that the object is within the unit sphere. Our SDF network fθs, is an 8 layer MLP with 256 hidden units with a skip connection in the middle. The weights of the SDF network are initialized by geometric initialization [2]. The color MLP is a 4 layer MLP with 256 hidden units. The 3D position is encoded with 6 frequencies, whereas the viewing direction is encoded with 4 frequencies. We train the network for 300k iterations which takes roughly 8 hours on NVIDIA RTX 3090Ti GPU. After training, we extract the mesh using marching cubes [26] at a 512^3 resolution. In our experiments, we set Nt = 576, C = 8 and M = 16.