Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

Authors: Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Meta 3D Asset Gen (Asset Gen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object s appearance, Asset Gen outputs physically-based rendering (PBR) materials, supporting realistic relighting. Asset Gen generates first several views of the object with separate shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. Asset Gen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io
Researcher Affiliation Collaboration Yawar Siddiqui Tom Monnier* Filippos Kokkinos* Mahendra Kariya Yanir Kleiman Emilien Garreau Oran Gafni Natalia Neverova Andrea Vedaldi Roman Shapovalov* David Novotny* Gen AI, Meta TU Munich; intern with Meta *core technical contributors
Pseudocode No The paper describes methods but does not provide structured pseudocode or algorithm blocks.
Open Source Code No We do not currently plan to open-source the method or training data.
Open Datasets Yes Our training data consists of 140,000 meshes of diverse semantic categories created by 3D artists. [...] We tackle the sparse-view reconstruction task of predicting a 3D mesh from 4 posed images of an object on a subset of 332 meshes from Google Scanned Objects (GSO) [19].
Dataset Splits No The paper mentions training and testing, but does not explicitly provide validation dataset split information (e.g., percentages or counts) or reference standard splits that include a validation set.
Hardware Specification Yes Training spans a total of 2 days, employing 32 A100 GPUs with a total batch size of 128 and a learning rate of 10 5. [...] we train on 64 GPUS NVIDIA A100 gpus, yielding an effective batch size of 3 4 64 = 768 images.
Software Dependencies No Therefore, to support large batch sizes, image resolution, and denser point sampling on rays, we implement the direct SDF loss using custom Triton [67] kernels. While Triton is mentioned, a specific version number is not provided.
Experiment Setup Yes Training spans a total of 2 days, employing 32 A100 GPUs with a total batch size of 128 and a learning rate of 10 5. [...] The total loss has been optimized using Adam [41] with learning rate 10 4 for 13K steps.