Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
Authors: Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, BIN FU, Tao Chen, Gang Yu, Shenghua Gao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on a standard 3D shape generation benchmark, Shape Net [11], and a further collected 3D Cartoon Monster dataset with geometric details to validate the effectiveness of our proposed method. |
| Researcher Affiliation | Collaboration | 1Shanghai Tech University 2Tencent PCG, China 3School of Information Science and Technology, Fudan University, China 4Shanghai Engineering Research Center of Intelligent Vision and Imaging 5Shanghai Engineering Research Center of Energy Efficient and Custom AI IC |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/Neural Carver/Michelangelo and 'All codes will be publicly available.' |
| Open Datasets | Yes | We use a standard benchmark, Shape Net [11], to evaluate our model, which provides about 50K manufactured meshes in 55 categories. |
| Dataset Splits | Yes | We follow the train/val/test protocol with 3DILG [70]. We further collected 811 Cartoon Monster 3D shapes with detailed structures, with 615 shapes for training, 71 for validation, and 125 for testing |
| Hardware Specification | Yes | Our framework is implemented with Py Torch [44], and we both train the SITA-VAE and ASLDM models with 8 Tesla V100 GPUs for around 5 days. |
| Software Dependencies | No | The paper mentions 'implemented with Py Torch [44]' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | All attention modules are the transformer [62] style with multi-head attention mechanism (with 12 heads and 64 dimensions of each head), Layer Normalization (Pre-Norm) [3], Feed-Forward Network (with 3072 dimensions) [62] and GELU activation [17]. The learnable query embeddings are E R513 768... Both models use an Adam W-based gradient decent optimizer [34] with a 1e-4 learning rate. Our framework is implemented with Py Torch [44], and we both train the SITA-VAE and ASLDM models with 8 Tesla V100 GPUs for around 5 days. We use the DDIM sampling scheduler [60] with 50 steps... |