Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model
Authors: Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage is: https://jiahao.ai/instant3d/. |
| Researcher Affiliation | Collaboration | Jiahao Li1,2 Hao Tan1 Kai Zhang1 Zexiang Xu1 Fujun Luan1 Yinghao Xu1,3 Yicong Hong1,4 Kalyan Sunkavalli1 Greg Shakhnarovich2 Sai Bi1 1Adobe Research 2TTIC 3Stanford University 4 Australian National Univeristy |
| Pseudocode | No | The paper describes its methods in prose and uses architectural diagrams (Figure 2, Figure 3) but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "Our project webpage is: https://jiahao.ai/instant3d/." While a project webpage might contain links to code, the paper does not explicitly state that the source code for the methodology is openly available or provide a direct repository link. |
| Open Datasets | Yes | We adopt a large-scale synthetic 3D dataset Objaverse (Deitke et al., 2023b) and render four 512 × 512 views of about 750K objects with Blender. ... We train the model on multi-view renderings of the Objaverse dataset (Deitke et2 al., 2023b). |
| Dataset Splits | No | The paper describes selecting a subset of images for input and another for supervision during training, but it does not explicitly define or mention the use of a separate "validation" set for model tuning or early stopping, nor does it provide details on its split or size. |
| Hardware Specification | Yes | The training is done using 32 NVIDIA A100 GPUs for only 3 hours. ... The timing is measured using the default hyper-parameters of each method on an A100 GPU. ... The model is trained on 128 NVIDIA A100 GPUs and the whole training can be finished in 7 days. |
| Software Dependencies | No | The paper mentions several software components and frameworks like Blender, CLIP, BLIP-2, LLM, DINO-ViT, Flash Attention, scikit-learn, and PyTorch (e.g., "We use the NuSVC implementation from the popular scikit-learn framework"), but it does not specify exact version numbers for any of them. |
| Experiment Setup | Yes | We use AdamW optimizer with a fixed learning rate 10−5, β1 = 0.9, β2 = 0.999 and a weight decay of 10−2. We fine-tune the model using fp16 on 32 NVIDIA A100 GPUs with a total batch size of 192. No gradient accumulation is used. |