Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model

Authors: Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage is: https://jiahao.ai/instant3d/.
Researcher Affiliation Collaboration Jiahao Li1,2 Hao Tan1 Kai Zhang1 Zexiang Xu1 Fujun Luan1 Yinghao Xu1,3 Yicong Hong1,4 Kalyan Sunkavalli1 Greg Shakhnarovich2 Sai Bi1 1Adobe Research 2TTIC 3Stanford University 4 Australian National Univeristy
Pseudocode No The paper describes its methods in prose and uses architectural diagrams (Figure 2, Figure 3) but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper states: "Our project webpage is: https://jiahao.ai/instant3d/." While a project webpage might contain links to code, the paper does not explicitly state that the source code for the methodology is openly available or provide a direct repository link.
Open Datasets Yes We adopt a large-scale synthetic 3D dataset Objaverse (Deitke et al., 2023b) and render four 512 × 512 views of about 750K objects with Blender. ... We train the model on multi-view renderings of the Objaverse dataset (Deitke et2 al., 2023b).
Dataset Splits No The paper describes selecting a subset of images for input and another for supervision during training, but it does not explicitly define or mention the use of a separate "validation" set for model tuning or early stopping, nor does it provide details on its split or size.
Hardware Specification Yes The training is done using 32 NVIDIA A100 GPUs for only 3 hours. ... The timing is measured using the default hyper-parameters of each method on an A100 GPU. ... The model is trained on 128 NVIDIA A100 GPUs and the whole training can be finished in 7 days.
Software Dependencies No The paper mentions several software components and frameworks like Blender, CLIP, BLIP-2, LLM, DINO-ViT, Flash Attention, scikit-learn, and PyTorch (e.g., "We use the NuSVC implementation from the popular scikit-learn framework"), but it does not specify exact version numbers for any of them.
Experiment Setup Yes We use AdamW optimizer with a fixed learning rate 10−5, β1 = 0.9, β2 = 0.999 and a weight decay of 10−2. We fine-tune the model using fp16 on 32 NVIDIA A100 GPUs with a total batch size of 192. No gradient accumulation is used.