LRM-Zero: Training Large Reconstruction Models with Synthesized Data
Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Soeren Pirk, Arie Kaufman, Xin Sun, Hao Tan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. |
| Researcher Affiliation | Collaboration | Desai Xie 1, 2 Sai Bi1 Zhixin Shu1 Kai Zhang1 Zexiang Xu1 Yi Zhou1 Sören Pirk3 Arie Kaufman2 Xin Sun1 Hao Tan1 1Adobe Research 2Stony Brook University 3Kiel University |
| Pseudocode | No | The paper describes the model and data generation process but does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | The Zeroverse s procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/. ... The Zeroverse data synthesis script is released at https://github.com/desaixie/ zeroverse, and we hope that it can facilitate future research. |
| Open Datasets | Yes | We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse... The Zeroverse data synthesis script is released at https://github.com/desaixie/ zeroverse... We also quantitatively evaluate the model on two standard 3D reconstruction benchmark ABO [18] and GSO [28]. |
| Dataset Splits | No | The paper does not explicitly mention validation dataset splits with percentages or counts. |
| Hardware Specification | Yes | The overall training uses 64 A100 GPUs and takes 3 days. |
| Software Dependencies | No | In details, we use Blender s boolean modifier and solidify modifier to augment the initial shape. ... We use Blender s wireframe modifier and subdivision modifier to create the wireframe of a primitive shape. (No version numbers provided). |
| Experiment Setup | Yes | For rendering the multi-view images of Zeroverse, we follow [41]. For each object in Zeroverse, we render 32 views with randomly sampled camera rotations and random distances in the range of [2.0, 3.0]. Each image is rendered at the 512 512 resolution with uniform lighting. We use the same network architecture and follow the hyperparameters/implementation (e.g., 80K training steps, details as GS-LRM [107]. We only decrease perceptual loss weight from 0.5 to 0.2 to improve training stability. For the results comparison, we pre-train the model with 256-resolution and fine-tuned on 512-resolution following GS-LRM. |