Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

Authors: Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive qualitative and quantitative evaluation, we show that our method outperforms prior methods in generation quality of 3D shapes. and 4 EXPERIMENTS Table 1: Quantitative comparison.
Researcher Affiliation Academia Columbia University {rundi,rliu,vondrick,cxz}@cs.columbia.edu
Pseudocode No The paper describes the method using text and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We also include the source code in the supplementary materials. and https://sin3dm.github.io/
Open Datasets Yes Trained on a single 3D textured shape (left), Sin3DM is able to produce a diverse new samples, possibly of different sizes and aspect ratios. and in Figure 1, examples are cited: acropolis (choly kurd, 2021); bottom: industry house (Lukas carnota, 2015).
Dataset Splits No The paper describes training parameters and evaluation metrics, but it does not specify a separate validation dataset split. The model is trained on a single 3D shape, and evaluation is done by generating new samples from that single shape.
Hardware Specification Yes With the above settings, the training usually takes 2 3 hours on an NVIDIA RTX A6000.
Software Dependencies No The paper mentions 'Adam W optimizer' but does not specify versions for core software dependencies like Python, PyTorch, or TensorFlow, nor other libraries.
Experiment Setup Yes The input 3D grid has a resolution 256, i.e., max(H, W, D) = 256, and the signed distance threshold ϵd is set to 3/256. The encoded triplane latent has a spatial resolution 128, i.e., max(H , W , D ) = 128, and the number of channels C = 12. We train the triplane auto-encoder for 25000 iterations using the Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate 5e 3 and a batch size of 216. The triplane latent diffusion model has a max time step T = 1000. We train it for 25000 iterations using the Adam W optimizer with an initial learning rate 5e 3 and a batch size of 32.