Generating Images with 3D Annotations Using Diffusion Models
Authors: Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method through extensive experiments on Image Net-100/200, Image Net-R, PASCAL3D+, Object Net3D, and OOD-CV. |
| Researcher Affiliation | Academia | 1Johns Hopkins University 2University of Freiburg 3Max Planck Institute for Informatics, Saarland Informatics Campus |
| Pseudocode | Yes | Algorithm 1 Generating images using our 3D-DST |
| Open Source Code | Yes | Our code is available at https://ccvl.jhu.edu/3D-DST/ |
| Open Datasets | Yes | We demonstrate the effectiveness of our method through extensive experiments on Image Net-100/200, Image Net-R, PASCAL3D+, Object Net3D, and OOD-CV. [...] We generate images of the 3D objects taken from 3D shape repositories (e.g., Shape Net and Objaverse) |
| Dataset Splits | Yes | The PASCAL3D+ dataset (Xiang et al., 2014) contains 11, 045 training images and 10, 812 validation images with category and object pose annotations. [...] By randomly splitting the synthetic data into training sets and validation sets, the pose consistency score (PCS) is given by a model s average performance on the validation sets after fitting the corresponding training sets. |
| Hardware Specification | Yes | We use two NVIDIA Quadro RTX 8000 GPUs for each training. [...] Each Ne Mo model is trained on four NVIDIA RTX A5000 for 10 hours. |
| Software Dependencies | No | The paper mentions software like 'Blender' and 'Adam W' (optimizer) and general frameworks like 'Control Net' and 'LLaMA', but does not provide specific version numbers for these or any other software libraries or dependencies. |
| Experiment Setup | Yes | During training, the input image size is 224 224, the batch size is set to 512, and we use Adam W Loshchilov & Hutter (2019) as the optimizer. The initial learning rate is 1e 4 and we use a cosine scheduler for learning rate decay. We use Rand-Augment Cubuk et al. (2020), random erasing Zhong et al. (2020), Mixup Zhang et al. (2018) and Cutmix Yun et al. (2019) for data augmentation. [...] For the classification-based model, Res Net50, we use the released implementation from Zhou et al. (2018) and train the model for 90 epochs with a learning rate of 0.01. For Ne Mo, we adopt the publicly released code Wang et al. (2021) and trained the Ne Mo models for 800 epochs on both the synthetic and real data. |