Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape
Authors: Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive qualitative and quantitative evaluation, we show that our method outperforms prior methods in generation quality of 3D shapes. and 4 EXPERIMENTS Table 1: Quantitative comparison. |
| Researcher Affiliation | Academia | Columbia University {rundi,rliu,vondrick,cxz}@cs.columbia.edu |
| Pseudocode | No | The paper describes the method using text and diagrams but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We also include the source code in the supplementary materials. and https://sin3dm.github.io/ |
| Open Datasets | Yes | Trained on a single 3D textured shape (left), Sin3DM is able to produce a diverse new samples, possibly of different sizes and aspect ratios. and in Figure 1, examples are cited: acropolis (choly kurd, 2021); bottom: industry house (Lukas carnota, 2015). |
| Dataset Splits | No | The paper describes training parameters and evaluation metrics, but it does not specify a separate validation dataset split. The model is trained on a single 3D shape, and evaluation is done by generating new samples from that single shape. |
| Hardware Specification | Yes | With the above settings, the training usually takes 2 3 hours on an NVIDIA RTX A6000. |
| Software Dependencies | No | The paper mentions 'Adam W optimizer' but does not specify versions for core software dependencies like Python, PyTorch, or TensorFlow, nor other libraries. |
| Experiment Setup | Yes | The input 3D grid has a resolution 256, i.e., max(H, W, D) = 256, and the signed distance threshold ϵd is set to 3/256. The encoded triplane latent has a spatial resolution 128, i.e., max(H , W , D ) = 128, and the number of channels C = 12. We train the triplane auto-encoder for 25000 iterations using the Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate 5e 3 and a batch size of 216. The triplane latent diffusion model has a max time step T = 1000. We train it for 25000 iterations using the Adam W optimizer with an initial learning rate 5e 3 and a batch size of 32. |