reproducibilityindex.ai

Generating Images with 3D Annotations Using Diffusion Models

Authors: Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method through extensive experiments on Image Net-100/200, Image Net-R, PASCAL3D+, Object Net3D, and OOD-CV.
Researcher Affiliation	Academia	1Johns Hopkins University 2University of Freiburg 3Max Planck Institute for Informatics, Saarland Informatics Campus
Pseudocode	Yes	Algorithm 1 Generating images using our 3D-DST
Open Source Code	Yes	Our code is available at https://ccvl.jhu.edu/3D-DST/
Open Datasets	Yes	We demonstrate the effectiveness of our method through extensive experiments on Image Net-100/200, Image Net-R, PASCAL3D+, Object Net3D, and OOD-CV. [...] We generate images of the 3D objects taken from 3D shape repositories (e.g., Shape Net and Objaverse)
Dataset Splits	Yes	The PASCAL3D+ dataset (Xiang et al., 2014) contains 11, 045 training images and 10, 812 validation images with category and object pose annotations. [...] By randomly splitting the synthetic data into training sets and validation sets, the pose consistency score (PCS) is given by a model s average performance on the validation sets after fitting the corresponding training sets.
Hardware Specification	Yes	We use two NVIDIA Quadro RTX 8000 GPUs for each training. [...] Each Ne Mo model is trained on four NVIDIA RTX A5000 for 10 hours.
Software Dependencies	No	The paper mentions software like 'Blender' and 'Adam W' (optimizer) and general frameworks like 'Control Net' and 'LLaMA', but does not provide specific version numbers for these or any other software libraries or dependencies.
Experiment Setup	Yes	During training, the input image size is 224 224, the batch size is set to 512, and we use Adam W Loshchilov & Hutter (2019) as the optimizer. The initial learning rate is 1e 4 and we use a cosine scheduler for learning rate decay. We use Rand-Augment Cubuk et al. (2020), random erasing Zhong et al. (2020), Mixup Zhang et al. (2018) and Cutmix Yun et al. (2019) for data augmentation. [...] For the classification-based model, Res Net50, we use the released implementation from Zhou et al. (2018) and train the model for 90 epochs with a learning rate of 0.01. For Ne Mo, we adopt the publicly released code Wang et al. (2021) and trained the Ne Mo models for 800 epochs on both the synthetic and real data.