Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation

Authors: Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate on fine-grained bird classification (CUB (41)), domain generalization (i Wild Cam (17)), and contextual bias (Waterbirds (32)) datasets. We show that the addition of ALIA generated data outperforms traditional data augmentation techniques and text-to-image generated data by up to 7%, even beating the performance of adding in real data on i Wild Cam.
Researcher Affiliation Academia UC Berkeley
Pseudocode No The paper describes its method in prose and flowcharts but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/lisadunlap/ALIA.
Open Datasets Yes We evaluate on fine-grained bird classification (CUB (41)), domain generalization (i Wild Cam (17)), and contextual bias (Waterbirds (32)) datasets.
Dataset Splits Yes For each method, we do a hyperparameter sweep across learning rate and weight decay and choose the parameters with the highest validation performance.
Hardware Specification Yes We train on 10 Ge Force RTX 2080 Ti GPUs.
Software Dependencies Yes We use the Py Torch pretrained models (27) on Image Net with an Adam optimizer (16) and cosine learning rate scheduler. For all the diffusion-based editing methods, we use Stable Diffusion version 1.5 (31) from Hugging Face (40) with the default hyperparameters aside from the edit strength (how much to deviate from the original image) and the text guidance (how closely the generated image should align with the text prompt). We use BLIP (19) captioning model and GPT-4 (25) for summarizing the captions.
Experiment Setup Yes For each method, we do a hyperparameter sweep across learning rate and weight decay and choose the parameters with the highest validation performance. For all the diffusion-based editing methods, we use Stable Diffusion version 1.5 (31) from Hugging Face (40) with the default hyperparameters aside from the edit strength (how much to deviate from the original image) and the text guidance (how closely the generated image should align with the text prompt). For these parameters, we search over 5 different values each for edit strength and text guidance, visualizing the resulting generations for a random sample (10 images) across 4 random seeds (40 generated images in total). Results are averaged over 3 random seeds and further details on the hyperparameter search space and final choice of hyperparameters are listed in the Appendix.