Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation
Authors: Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate on fine-grained bird classification (CUB (41)), domain generalization (i Wild Cam (17)), and contextual bias (Waterbirds (32)) datasets. We show that the addition of ALIA generated data outperforms traditional data augmentation techniques and text-to-image generated data by up to 7%, even beating the performance of adding in real data on i Wild Cam. |
| Researcher Affiliation | Academia | UC Berkeley |
| Pseudocode | No | The paper describes its method in prose and flowcharts but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/lisadunlap/ALIA. |
| Open Datasets | Yes | We evaluate on fine-grained bird classification (CUB (41)), domain generalization (i Wild Cam (17)), and contextual bias (Waterbirds (32)) datasets. |
| Dataset Splits | Yes | For each method, we do a hyperparameter sweep across learning rate and weight decay and choose the parameters with the highest validation performance. |
| Hardware Specification | Yes | We train on 10 Ge Force RTX 2080 Ti GPUs. |
| Software Dependencies | Yes | We use the Py Torch pretrained models (27) on Image Net with an Adam optimizer (16) and cosine learning rate scheduler. For all the diffusion-based editing methods, we use Stable Diffusion version 1.5 (31) from Hugging Face (40) with the default hyperparameters aside from the edit strength (how much to deviate from the original image) and the text guidance (how closely the generated image should align with the text prompt). We use BLIP (19) captioning model and GPT-4 (25) for summarizing the captions. |
| Experiment Setup | Yes | For each method, we do a hyperparameter sweep across learning rate and weight decay and choose the parameters with the highest validation performance. For all the diffusion-based editing methods, we use Stable Diffusion version 1.5 (31) from Hugging Face (40) with the default hyperparameters aside from the edit strength (how much to deviate from the original image) and the text guidance (how closely the generated image should align with the text prompt). For these parameters, we search over 5 different values each for edit strength and text guidance, visualizing the resulting generations for a random sample (10 images) across 4 random seeds (40 generated images in total). Results are averaged over 3 random seeds and further details on the hyperparameter search space and final choice of hyperparameters are listed in the Appendix. |