LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Soeren Pirk, Arie Kaufman, Xin Sun, Hao Tan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse.
Researcher Affiliation Collaboration Desai Xie 1, 2 Sai Bi1 Zhixin Shu1 Kai Zhang1 Zexiang Xu1 Yi Zhou1 Sören Pirk3 Arie Kaufman2 Xin Sun1 Hao Tan1 1Adobe Research 2Stony Brook University 3Kiel University
Pseudocode No The paper describes the model and data generation process but does not include formal pseudocode blocks or algorithms.
Open Source Code Yes The Zeroverse s procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/. ... The Zeroverse data synthesis script is released at https://github.com/desaixie/ zeroverse, and we hope that it can facilitate future research.
Open Datasets Yes We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse... The Zeroverse data synthesis script is released at https://github.com/desaixie/ zeroverse... We also quantitatively evaluate the model on two standard 3D reconstruction benchmark ABO [18] and GSO [28].
Dataset Splits No The paper does not explicitly mention validation dataset splits with percentages or counts.
Hardware Specification Yes The overall training uses 64 A100 GPUs and takes 3 days.
Software Dependencies No In details, we use Blender s boolean modifier and solidify modifier to augment the initial shape. ... We use Blender s wireframe modifier and subdivision modifier to create the wireframe of a primitive shape. (No version numbers provided).
Experiment Setup Yes For rendering the multi-view images of Zeroverse, we follow [41]. For each object in Zeroverse, we render 32 views with randomly sampled camera rotations and random distances in the range of [2.0, 3.0]. Each image is rendered at the 512 512 resolution with uniform lighting. We use the same network architecture and follow the hyperparameters/implementation (e.g., 80K training steps, details as GS-LRM [107]. We only decrease perceptual loss weight from 0.5 to 0.2 to improve training stability. For the results comparison, we pre-train the model with 256-resolution and fine-tuned on 512-resolution following GS-LRM.