GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

Authors: Chubin Zhang, Hongliang Song, Yi Wei, Chen Yu, Jiwen Lu, Yansong Tang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that Geo LRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications.
Researcher Affiliation Collaboration Chubin Zhang1,3 Hongliang Song3 Yi Wei2 Yu Chen3 Jiwen Lu2 Yansong Tang1, 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Department of Automation, Tsinghua University 3Alibaba Group {zcb24, y-wei19}@mails.tsinghua.edu.cn, {hongliang.shl, chenyu.cheny}@alibaba-inc.com, lujiwen@tsinghua.edu.cn, tang.yansong@sz.tsinghua.edu.cn. corresponding author
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The project page: https://linshan-bin.github.io/Geo LRM/.
Open Datasets Yes G-buffer Objaverse (GObjaverse) [41]: Used for training. Derived from the original Objaverse [12] dataset, GObjaverse includes high-quality renderings of albedo, RGB, depth, and normal images. These images are generated through a hybrid technique combining rasterization and path tracing. The dataset comprises approximately 280,000 normalized 3D models scaled to fit within a cubic space of [ 0.5, 0.5]3. GObjaverse employs a diverse camera setup involving: Two orbital paths yielding 36 views per model. This includes 24 views at elevations between 5 and 30 (incremented by 15 rotations) and 12 views at near-horizontal elevations from -5 to 5 (with 30 rotation steps). Additional top and bottom views for comprehensive spatial coverage.
Dataset Splits No The paper mentions training on GObjaverse and testing on GSO and Omni Object3D, but does not explicitly detail a validation split or its size within the text provided. It mentions selecting views for supervision and inputs, but not a dedicated validation set.
Hardware Specification Yes We train both the proposal transformer and the reconstruction transformer for 12 epochs on GObjaverse [41], which takes 0.5 and 2 days respectively on 32 A100 40G.
Software Dependencies No The paper does not list specific software dependencies with their version numbers (e.g., PyTorch 1.9, Python 3.8). It mentions DINOv2 and AdamW, which are models/optimizers, but not system-level software dependencies with versions.
Experiment Setup Yes During training, we maintain a maximum number of transformer input tokens of 4k and randomly select 8 views from a possible 38 for supervision. From these 8 views, we randomly select 1 to 7 views as inputs to predict the remaining views. This flexibility in view selection not only tests the robustness of our method but also mimics real-world scenarios where complete data may not always be available. Both input and rendering resolutions are maintained at 448x448 pixels. At the testing and inference stages, we use a resolution of 512x512 to align with existing methods. Besides, the number of input tokens is extended to 16k during testing, showcasing its scalability without the need for fine-tuning. Detailed information on our model s architecture and training procedures can be found in Section A.3.