Make-A-Shape: a Ten-Million-scale 3D Shape Model

Authors: Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results in Table 3 reveal that our single-view model surpasses all the existing baselines (Point-E (Nichol et al., 2022), Shap-E (Jun & Nichol, 2023), and One-2-3-45 (Liu et al., 2023a)) by a significant margin for all three metrics (Io U, LFD, and CD). Note that LFD is a rotation-insensitive metric, indicating that the effectiveness of our approach does not depend on how the generated shapes are aligned with the ground-truth shapes. For the concurrent work Open LRM (He & Wang, 2024), our model demonstrates similar or better performance for different metrics, despite that it has only one tenth of the model parameters (25M vs 260M, see Table 4), highlighting its high efficiency and effectiveness.
Researcher Affiliation Collaboration Ka-Hei Hui 1 * Aditya Sanghi 2 * Arianna Rampini 2 Kamal Rahimi Malekshan 2 Zhengzhe Liu 1 Hooman Shayani 2 Chi-Wing Fu 1 1The Chinese University of Hong Kong, Hong Kong SAR, China 2Autodesk Research.
Pseudocode No The paper describes its methods but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper mentions Open LRM (He & Wang, 2024) as an open-sourced implementation of a concurrent work, but it does not provide explicit access or a statement of release for the code related to Make-A-Shape itself.
Open Datasets Yes Dataset. We compile a new, extensive dataset that features over 10 million 3D shapes aggregated from 18 different publicly-available sub-datasets: Model Net (Vishwanath et al., 2009), Shape Net (Chang et al., 2015), SMPL (Loper et al., 2015), Thingi10K (Zhou & Jacobson, 2016), SMAL (Zuffi et al., 2017), COMA (Ranjan et al., 2018), House3D (Wu et al., 2018), ABC (Koch et al., 2019), Fusion 360 (Willis et al., 2021), 3D-FUTURE (Fu et al., 2021), Building Net (Selvaraju et al., 2021), Deforming Things4D (Li et al., 2021), FG3D (Liu et al., 2021), Toys4K (Stojanov et al., 2021), ABO (Collins et al., 2022), Infinigen (Raistrick et al., 2023), Objaverse (Deitke et al., 2023), and two subsets of Objaverse XL (Deitke et al., 2023) (Thingiverse and Git Hub).
Dataset Splits Yes For the data division, we randomly split each sub-dataset into two segments: a training set, which includes 98% of the shapes, and a testing set, which contains the remaining 2%. We then assembled the ultimate training and testing datasets by merging these segmented sets from each sub-dataset. For qualitative evaluation, we utilized the testing set shapes to provide the visual results (ā‰ˆ 2% of shapes). Due to computational constraints, for quantitative evaluation, we randomly selected 50 shapes from the test set of each sub-dataset. This set-aside collection is denoted as the Our Val dataset and will be used throughout the remainder of the paper.
Hardware Specification Yes Our model is trained on 48 A10G with 2M-4M iterations, depending on the input condition.
Software Dependencies No The paper mentions specific optimizers and pre-trained models (e.g., Adam Optimizer, CLIP L-14 image encoder, Point Net) but does not provide specific version numbers for software libraries or dependencies used in the implementation.
Experiment Setup Yes Training Details. We train our shape model Make-A-Shape using the Adam Optimizer (Kingma & Ba, 2014) with a learning rate of 1eāˆ’4 and a batch size of 96. To stabilize the training, we employ an exponential moving average with a decay rate of 0.9999, in line with existing 2D large-scale diffusion models (Rombach et al., 2022). Our model is trained on 48 A10G with 2M-4M iterations, depending on the input condition. Each model is trained over 20 days, amounting to around 23,000 GPU hours.