Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Authors: Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Feng Yiqiang, Yadong Mu, Katerina Fragkiadaki

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that PARTCRAFTER outperforms existing approaches in generating decomposable 3D meshes, including parts that are not directly visible in input images, demonstrating the strength of part-aware generative priors for 3D understanding and synthesis. Our experiments aim to answer the following questions: (1) How does PARTCRAFTER perform in part-level reconstruction of objects and scenes compared to existing state-of-the-art models that first segment and then reconstruct parts at the object and scene level? (2) Can PARTCRAFTER reconstruct parts that are not visible in the image prompt? (3) How do results vary with different numbers of parts? (4) What are the contributions of design choices in our local-global denoising transformer? Baselines To the best of our knowledge, PARTCRAFTER is the first work to generate 3D part-level object meshes from a single image. Recent works Part123 [4] and Part Gen [5] reconstruct 3D neural fields [21, 54] from images, which are not directly comparable to our work that focuses on meshes. We consider the following baselines: (1) Holo Part [6] on object level, which is a concurrent work that first segments a given 3D object mesh and then completes the coarse-segmented parts into fine-grained meshes. (2) MIDI [7] on scene level, which reconstructs multi-instance 3D scenes using object segmentation prompts.
Researcher Affiliation Collaboration Yuchen Lin1,3 , Chenguo Lin1 , Panwang Pan2 , Honglei Yan2, Yiqiang Feng2, Yadong Mu1, Katerina Fragkiadaki3 Equal contribution Project lead 1Peking University, 2Byte Dance, 3Carnegie Mellon University
Pseudocode No The paper describes the method and architecture through textual explanations and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and training data are released. We will release our code under an MIT license.
Open Datasets Yes To support part-level supervision, we curate a new dataset by mining part-level annotations from large-scale 3D object datasets. Our curated dataset merges Objaverse [9], Shape Net [10], and the Amazon Berkeley Objects (ABO) dataset [11], resulting in a rich collection of part-annotated 3D models suitable for learning compositional generation. As for scene-level generation, we leverage the existing 3D scene dataset 3D-Front [12] for training. We collect our part-level dataset from Objaverse [9] (ODC-By v1.0 License), Shape Net-Core[10] (Custom License), and Amazon Berkeley Objects [11] (CC-BY 4.0 License).
Dataset Splits No We evaluate PARTCRAFTER on a test set of about 2K data samples.
Hardware Specification Yes PARTCRAFTER is trained on 8 H20 GPUs with a batch size of 256 by fully finetuning the pretrained Tripo SG [1]. We report the average generation time of objects or scenes with 4 parts in the test set on an H20 GPU.
Software Dependencies No The paper mentions using pre-existing models and tools like Tripo SG [1], DINOv2 [70], SAMPart3D [80], MIDI [7], and Hunyuan3D-2 [38], as well as GPT-4o for style transfer. However, it does not specify version numbers for general software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup Yes PARTCRAFTER is trained on 8 H20 GPUs with a batch size of 256 by fully finetuning the pretrained Tripo SG [1]. We first train a base model for up to 8 parts on our curated part-level object dataset at a learning rate of 1e-4 for 5K iterations. For part-decomposable objects, we then finetune the base model to support up to 16 parts. For object-composed scenes, we further adapt the base model to the 3D-Front [12] dataset for up to 8 objects. Both finetuning processes last for 5K iterations at a reduced learning rate of 5e-5. We include 30% monolithic objects in training for regularization. The whole training process takes about 2 days. We use 512 tokens for each part, which we find is sufficient to represent part geometry and semantics.