Make-A-Shape: a Ten-Million-scale 3D Shape Model
Authors: Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results in Table 3 reveal that our single-view model surpasses all the existing baselines (Point-E (Nichol et al., 2022), Shap-E (Jun & Nichol, 2023), and One-2-3-45 (Liu et al., 2023a)) by a significant margin for all three metrics (Io U, LFD, and CD). Note that LFD is a rotation-insensitive metric, indicating that the effectiveness of our approach does not depend on how the generated shapes are aligned with the ground-truth shapes. For the concurrent work Open LRM (He & Wang, 2024), our model demonstrates similar or better performance for different metrics, despite that it has only one tenth of the model parameters (25M vs 260M, see Table 4), highlighting its high efficiency and effectiveness. |
| Researcher Affiliation | Collaboration | Ka-Hei Hui 1 * Aditya Sanghi 2 * Arianna Rampini 2 Kamal Rahimi Malekshan 2 Zhengzhe Liu 1 Hooman Shayani 2 Chi-Wing Fu 1 1The Chinese University of Hong Kong, Hong Kong SAR, China 2Autodesk Research. |
| Pseudocode | No | The paper describes its methods but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions Open LRM (He & Wang, 2024) as an open-sourced implementation of a concurrent work, but it does not provide explicit access or a statement of release for the code related to Make-A-Shape itself. |
| Open Datasets | Yes | Dataset. We compile a new, extensive dataset that features over 10 million 3D shapes aggregated from 18 different publicly-available sub-datasets: Model Net (Vishwanath et al., 2009), Shape Net (Chang et al., 2015), SMPL (Loper et al., 2015), Thingi10K (Zhou & Jacobson, 2016), SMAL (Zuffi et al., 2017), COMA (Ranjan et al., 2018), House3D (Wu et al., 2018), ABC (Koch et al., 2019), Fusion 360 (Willis et al., 2021), 3D-FUTURE (Fu et al., 2021), Building Net (Selvaraju et al., 2021), Deforming Things4D (Li et al., 2021), FG3D (Liu et al., 2021), Toys4K (Stojanov et al., 2021), ABO (Collins et al., 2022), Infinigen (Raistrick et al., 2023), Objaverse (Deitke et al., 2023), and two subsets of Objaverse XL (Deitke et al., 2023) (Thingiverse and Git Hub). |
| Dataset Splits | Yes | For the data division, we randomly split each sub-dataset into two segments: a training set, which includes 98% of the shapes, and a testing set, which contains the remaining 2%. We then assembled the ultimate training and testing datasets by merging these segmented sets from each sub-dataset. For qualitative evaluation, we utilized the testing set shapes to provide the visual results (ā 2% of shapes). Due to computational constraints, for quantitative evaluation, we randomly selected 50 shapes from the test set of each sub-dataset. This set-aside collection is denoted as the Our Val dataset and will be used throughout the remainder of the paper. |
| Hardware Specification | Yes | Our model is trained on 48 A10G with 2M-4M iterations, depending on the input condition. |
| Software Dependencies | No | The paper mentions specific optimizers and pre-trained models (e.g., Adam Optimizer, CLIP L-14 image encoder, Point Net) but does not provide specific version numbers for software libraries or dependencies used in the implementation. |
| Experiment Setup | Yes | Training Details. We train our shape model Make-A-Shape using the Adam Optimizer (Kingma & Ba, 2014) with a learning rate of 1eā4 and a batch size of 96. To stabilize the training, we employ an exponential moving average with a decay rate of 0.9999, in line with existing 2D large-scale diffusion models (Rombach et al., 2022). Our model is trained on 48 A10G with 2M-4M iterations, depending on the input condition. Each model is trained over 20 days, amounting to around 23,000 GPU hours. |