Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

Authors: Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, BIN FU, Tao Chen, Gang Yu, Shenghua Gao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on a standard 3D shape generation benchmark, Shape Net [11], and a further collected 3D Cartoon Monster dataset with geometric details to validate the effectiveness of our proposed method.
Researcher Affiliation Collaboration 1Shanghai Tech University 2Tencent PCG, China 3School of Information Science and Technology, Fudan University, China 4Shanghai Engineering Research Center of Intelligent Vision and Imaging 5Shanghai Engineering Research Center of Energy Efficient and Custom AI IC
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/Neural Carver/Michelangelo and 'All codes will be publicly available.'
Open Datasets Yes We use a standard benchmark, Shape Net [11], to evaluate our model, which provides about 50K manufactured meshes in 55 categories.
Dataset Splits Yes We follow the train/val/test protocol with 3DILG [70]. We further collected 811 Cartoon Monster 3D shapes with detailed structures, with 615 shapes for training, 71 for validation, and 125 for testing
Hardware Specification Yes Our framework is implemented with Py Torch [44], and we both train the SITA-VAE and ASLDM models with 8 Tesla V100 GPUs for around 5 days.
Software Dependencies No The paper mentions 'implemented with Py Torch [44]' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes All attention modules are the transformer [62] style with multi-head attention mechanism (with 12 heads and 64 dimensions of each head), Layer Normalization (Pre-Norm) [3], Feed-Forward Network (with 3072 dimensions) [62] and GELU activation [17]. The learnable query embeddings are E R513 768... Both models use an Adam W-based gradient decent optimizer [34] with a 1e-4 learning rate. Our framework is implemented with Py Torch [44], and we both train the SITA-VAE and ASLDM models with 8 Tesla V100 GPUs for around 5 days. We use the DDIM sampling scheduler [60] with 50 steps...