ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

Authors: Jingyuan Zhu, Shiyu Li, Yuxuan (Andy) Liu, Jian Yuan, Ping Huang, Jiulong Shan, Huimin Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of our ODGEN in specific domains on 7 subsets of the Roboflow-100 benchmark [6]. Extensive experimental results show that adding our synthetic data improves up to 25.3% m AP@.50:.95 on YOLO detectors, outperforming prior controllable generative methods. Furthermore, we validate ODGEN in general domains with an evaluation protocol designed based on COCO-2014 [37] and gain an advantage up to 5.6% in m AP@.50:.95 than prior methods.
Researcher Affiliation Collaboration Jingyuan Zhu1 Shiyu Li2 Yuxuan Liu2 Jian Yuan1 Ping Huang2 Jiulong Shan2 Huimin Ma3 1Tsinghua University, China 2Apple 3University of Science and Technology Beijing
Pseudocode No The paper describes algorithms and pipelines through figures (e.g., Figure 2, Figure 3), and provides Algorithm 1 for bounding box conversion, but it does not present formal pseudocode blocks with explicit 'Pseudocode' or 'Algorithm' labels for the main method.
Open Source Code No The code is not included yet. We have provided enough details for implementation to ensure reproducibility. We will release the code and models soon.
Open Datasets Yes We evaluate the effectiveness of our ODGEN in specific domains on 7 subsets of the Roboflow-100 benchmark [6]... Furthermore, we validate ODGEN in general domains with an evaluation protocol designed based on COCO-2014 [37].
Dataset Splits Yes The full validation set is used as the standard to choose the best checkpoint from YOLO models trained for 100 epochs. ...We use 10k annotations randomly sampled from the COCO validation set to generate a synthetic dataset of 10k images. The YOLO models are trained on this synthetic dataset from scratch and evaluated on the other 31k images in the COCO validation set.
Hardware Specification Yes Then we train the object-wise conditioning modules on a V100 GPU with batch size 4 for 200 epochs... We train our ODGEN on the COCO training set with batch size 32 for 60 epochs on 8 V100 GPUs.
Software Dependencies Yes Our approach is implemented with Stable Diffusion v2.1 and compared with prior controllable generation methods based on Stable Diffusion, including Re Co [61], GLIGEN [33], Control Net [62], Geo Diffusion [5], Instance Diffusion [55], and MIGC [67]. YOLO models are trained with the same recipe as Roboflow-100 [6] for 100 epochs to ensure convergence. YOLOv5s [25] and YOLOv7 [54].
Experiment Setup Yes The whole training process including the fine-tuning on both cropped objects and entire images and the training of the object-wise conditioning module, only depends on the 200 images. λ in Eq. (1) and γ in Eq. (2) are set as 1 and 25. We first fine-tune the diffusion model according to Fig. 2 (a) for 3k iterations. Then we train the object-wise conditioning modules on a V100 GPU with batch size 4 for 200 epochs... We train our ODGEN on the COCO training set with batch size 32 for 60 epochs on 8 V100 GPUs.