DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Authors: Shentong Mo, Enze Xie, Ruihang Chu, Lanqing Hong, Matthias Niessner, Zhenguo Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Shape Net dataset demonstrate that the proposed Di T-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our Di T-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance. |
| Researcher Affiliation | Collaboration | Shentong Mo1 Enze Xie2 Ruihang Chu3 Lewei Yao2 Lanqing Hong2 Matthias Nießner4 Zhenguo Li2 1MBZUAI, 2Huawei Noah s Ark Lab, 3CUHK, 4TUM |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described in narrative text and illustrated with a diagram in Figure 2. |
| Open Source Code | No | The paper includes a link to a project page (https://Di T-3D.github.io) but does not explicitly state that the source code for the described methodology is released or available at this link. The only direct GitHub link (https://github.com/facebookresearch/Di T/tree/main/diffusion) refers to pre-trained DiT-2D checkpoints, not the DiT-3D code presented in this paper. |
| Open Datasets | Yes | Following most previous works [12, 13], we use Shape Net [38] Chair, Airplane, and Car as our primary datasets for 3D shape generation. |
| Dataset Splits | Yes | We also use the same dataset splits and pre-processing in Point Flow [9] |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as particular GPU models (e.g., NVIDIA A100), CPU models, or cloud computing instance types with their specifications. |
| Software Dependencies | No | The paper states, "Our implementation is based on the Py Torch [39] framework." However, it does not provide specific version numbers for PyTorch or any other software dependencies needed to reproduce the experiments. |
| Experiment Setup | Yes | The input voxel size is 32 32 32 3, i.e., V = 32. ... The models were trained for 10,000 epochs using the Adam optimizer [40] with a learning rate of 1e 4 and a batch size of 128. We set T = 1000 for experiments. In the default setting, we use S/4 with patch size p = 4 as the backbone. Note that we utilize 3D window attention in partial blocks (i.e., 0,3,6,9) and global attention in other blocks. |