3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

Authors: Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, Hang Xu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects according to the input captions across 98 different categories, in terms of PSNR, SSIM, LPIPS and CLIP-score, compared with text-Ne RF and Dreamfields.
Researcher Affiliation Collaboration Zutao Jiang1, 6 *, Guansong Lu 2 *, Xiaodan Liang 3, 4 , Jihua Zhu 1 , Wei Zhang 2, Xiaojun Chang 5, Hang Xu 2 1 School of Software Engineering, Xi an Jiaotong University 2 Huawei Noah s Ark Lab 3 Sun Yat-sen University 4 MBZUAI 5 Re LER, AAII, University of Technology Sydney 6 Peng Cheng Laboratory
Pseudocode No The paper describes the system architecture and components (e.g., in the 'Method' section and Figure 2), but it does not include any formal pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper mentions 'We use the code opensourced by the authors' in reference to baseline methods (text-Ne RF and Dreamfields), but it does not provide any statement or link indicating that the source code for their own 3D-TOGO model is publicly available.
Open Datasets Yes Our approach is evaluated on Amazon-Berkeley Objects (ABO) (Collins et al. 2022), a large-scale dataset containing nearly 8,000 real household objects from 98 categories with their corresponding nature language descriptions.
Dataset Splits Yes We randomly split 80%, 10%, 10% objects as our training, validation, and test set, respectively.
Hardware Specification No The paper mentions software details like 'implement our algorithm with Pytorch' and 'Adam W optimizer', but it does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions software components such as 'Pytorch', 'Adam W optimizer', 'VQGAN', 'CLIP model', and 'pixel Ne RF', and 'Mind Spore', but it does not specify any version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes The hyper-parameters of λpose, λtxt, λprior, λimg, λpixel, λcaption and λcontrastive are set to 0.1, 0.1, 0.1, 0.6, 1, 1 and 1 respectively. For our text-to-views generation module, we use Adam W optimizer to train 20 epochs. For the views-to-3D generation module, we use Adam optimizer to train 100 epochs and randomly select 9 views during each training step. More details are provided in the Appendix.