DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

Authors: Shentong Mo, Enze Xie, Ruihang Chu, Lanqing Hong, Matthias Niessner, Zhenguo Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Shape Net dataset demonstrate that the proposed Di T-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our Di T-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.
Researcher Affiliation Collaboration Shentong Mo1 Enze Xie2 Ruihang Chu3 Lewei Yao2 Lanqing Hong2 Matthias Nießner4 Zhenguo Li2 1MBZUAI, 2Huawei Noah s Ark Lab, 3CUHK, 4TUM
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described in narrative text and illustrated with a diagram in Figure 2.
Open Source Code No The paper includes a link to a project page (https://Di T-3D.github.io) but does not explicitly state that the source code for the described methodology is released or available at this link. The only direct GitHub link (https://github.com/facebookresearch/Di T/tree/main/diffusion) refers to pre-trained DiT-2D checkpoints, not the DiT-3D code presented in this paper.
Open Datasets Yes Following most previous works [12, 13], we use Shape Net [38] Chair, Airplane, and Car as our primary datasets for 3D shape generation.
Dataset Splits Yes We also use the same dataset splits and pre-processing in Point Flow [9]
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as particular GPU models (e.g., NVIDIA A100), CPU models, or cloud computing instance types with their specifications.
Software Dependencies No The paper states, "Our implementation is based on the Py Torch [39] framework." However, it does not provide specific version numbers for PyTorch or any other software dependencies needed to reproduce the experiments.
Experiment Setup Yes The input voxel size is 32 32 32 3, i.e., V = 32. ... The models were trained for 10,000 epochs using the Adam optimizer [40] with a learning rate of 1e 4 and a batch size of 128. We set T = 1000 for experiments. In the default setting, we use S/4 with patch size p = 4 as the backbone. Note that we utilize 3D window attention in partial blocks (i.e., 0,3,6,9) and global attention in other blocks.