Generating Images with 3D Annotations Using Diffusion Models

Authors: Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method through extensive experiments on Image Net-100/200, Image Net-R, PASCAL3D+, Object Net3D, and OOD-CV.
Researcher Affiliation Academia 1Johns Hopkins University 2University of Freiburg 3Max Planck Institute for Informatics, Saarland Informatics Campus
Pseudocode Yes Algorithm 1 Generating images using our 3D-DST
Open Source Code Yes Our code is available at https://ccvl.jhu.edu/3D-DST/
Open Datasets Yes We demonstrate the effectiveness of our method through extensive experiments on Image Net-100/200, Image Net-R, PASCAL3D+, Object Net3D, and OOD-CV. [...] We generate images of the 3D objects taken from 3D shape repositories (e.g., Shape Net and Objaverse)
Dataset Splits Yes The PASCAL3D+ dataset (Xiang et al., 2014) contains 11, 045 training images and 10, 812 validation images with category and object pose annotations. [...] By randomly splitting the synthetic data into training sets and validation sets, the pose consistency score (PCS) is given by a model s average performance on the validation sets after fitting the corresponding training sets.
Hardware Specification Yes We use two NVIDIA Quadro RTX 8000 GPUs for each training. [...] Each Ne Mo model is trained on four NVIDIA RTX A5000 for 10 hours.
Software Dependencies No The paper mentions software like 'Blender' and 'Adam W' (optimizer) and general frameworks like 'Control Net' and 'LLaMA', but does not provide specific version numbers for these or any other software libraries or dependencies.
Experiment Setup Yes During training, the input image size is 224 224, the batch size is set to 512, and we use Adam W Loshchilov & Hutter (2019) as the optimizer. The initial learning rate is 1e 4 and we use a cosine scheduler for learning rate decay. We use Rand-Augment Cubuk et al. (2020), random erasing Zhong et al. (2020), Mixup Zhang et al. (2018) and Cutmix Yun et al. (2019) for data augmentation. [...] For the classification-based model, Res Net50, we use the released implementation from Zhou et al. (2018) and train the model for 90 epochs with a learning rate of 0.01. For Ne Mo, we adopt the publicly released code Wang et al. (2021) and trained the Ne Mo models for 800 epochs on both the synthetic and real data.