Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Authors: Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. ... We conduct thorough experiments to validate the effectiveness of our proposed components using the available 3D ground truth from the dataset. We compare with multiple variants of the model. ... We provide visual comparisons and quantitative analysis between our methods and existing baselines in Fig. 4 and Tab. 1. |
| Researcher Affiliation | Collaboration | Dejia Xu EMAIL University of Texas at Austin Ye Yuan EMAIL NVIDIA Morteza Mardani EMAIL NVIDIA Sifei Liu EMAIL NVIDIA Jiaming Song EMAIL NVIDIA Zhangyang Wang EMAIL University of Texas at Austin Arash Vahdat EMAIL NVIDIA |
| Pseudocode | No | The paper describes methods and processes in narrative text and with architectural diagrams (e.g., Figure 1, Figure 2, Figure 3), but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | No | Project page: https://ir1d.github.io/AGG/ The paper provides a project page URL but does not explicitly state that the source code for the methodology is available there or elsewhere. The prompt states that a project page is not sufficient unless it explicitly says code is available. |
| Open Datasets | Yes | Our model is trained on Omni Object3D dataset Wu et al. (2023b), which contains high-quality scans of real-world objects. |
| Dataset Splits | Yes | We construct our training set using 2,370 objects from 73 classes in total. We train one model using all classes. The test set contains 146 objects, with two left-out objects per class. |
| Hardware Specification | Yes | The runtime of our method is measured on the test set using an A100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'DINOv2-base model' but does not provide specific version numbers for these or other key software libraries and languages used. |
| Experiment Setup | Yes | The model is trained with Adam optimizer Kingma & Ba (2014). Rendering loss is enforced at 128 128 resolution. We first train the coarse hybrid generator for ten epochs. The learning rate is set to 1e 4 maximum, with a warmup stage for three epochs and followed by a cosine annealing learning rate scheduler. During the warmup epochs, we set the loss weight of Lchamfer to ten and Lrendering to one. Later, we gradually reduce the weight of Lchamfer and increase the weight of Lrendering until the weight of Lrendering is ten and that of Lchamfer is one. Then, we freeze the parameters for the coarse hybrid generator and only train the Gaussian super-resolution module. We use Lrendering and optimize for five epochs, with a learning rate set to 1e 4. Finally, we unfreeze all the parameters in both modules and train with rendering loss Lrendering for three epochs, with a learning rate set to 1e 5. For Lrendering, we set ω1 = 2 for all epochs. ... The input image is 256 256 resolution... In each iteration, we render the generated 3D Gaussians into eight novel views... |