3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
Authors: Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, Hang Xu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects according to the input captions across 98 different categories, in terms of PSNR, SSIM, LPIPS and CLIP-score, compared with text-Ne RF and Dreamfields. |
| Researcher Affiliation | Collaboration | Zutao Jiang1, 6 *, Guansong Lu 2 *, Xiaodan Liang 3, 4 , Jihua Zhu 1 , Wei Zhang 2, Xiaojun Chang 5, Hang Xu 2 1 School of Software Engineering, Xi an Jiaotong University 2 Huawei Noah s Ark Lab 3 Sun Yat-sen University 4 MBZUAI 5 Re LER, AAII, University of Technology Sydney 6 Peng Cheng Laboratory |
| Pseudocode | No | The paper describes the system architecture and components (e.g., in the 'Method' section and Figure 2), but it does not include any formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper mentions 'We use the code opensourced by the authors' in reference to baseline methods (text-Ne RF and Dreamfields), but it does not provide any statement or link indicating that the source code for their own 3D-TOGO model is publicly available. |
| Open Datasets | Yes | Our approach is evaluated on Amazon-Berkeley Objects (ABO) (Collins et al. 2022), a large-scale dataset containing nearly 8,000 real household objects from 98 categories with their corresponding nature language descriptions. |
| Dataset Splits | Yes | We randomly split 80%, 10%, 10% objects as our training, validation, and test set, respectively. |
| Hardware Specification | No | The paper mentions software details like 'implement our algorithm with Pytorch' and 'Adam W optimizer', but it does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software components such as 'Pytorch', 'Adam W optimizer', 'VQGAN', 'CLIP model', and 'pixel Ne RF', and 'Mind Spore', but it does not specify any version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The hyper-parameters of λpose, λtxt, λprior, λimg, λpixel, λcaption and λcontrastive are set to 0.1, 0.1, 0.1, 0.6, 1, 1 and 1 respectively. For our text-to-views generation module, we use Adam W optimizer to train 20 epochs. For the views-to-3D generation module, we use Adam optimizer to train 100 epochs and randomly select 9 views during each training step. More details are provided in the Appendix. |