reproducibilityindex.ai

Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation

Authors: Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate on fine-grained bird classification (CUB (41)), domain generalization (i Wild Cam (17)), and contextual bias (Waterbirds (32)) datasets. We show that the addition of ALIA generated data outperforms traditional data augmentation techniques and text-to-image generated data by up to 7%, even beating the performance of adding in real data on i Wild Cam.
Researcher Affiliation	Academia	UC Berkeley
Pseudocode	No	The paper describes its method in prose and flowcharts but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/lisadunlap/ALIA.
Open Datasets	Yes	We evaluate on fine-grained bird classification (CUB (41)), domain generalization (i Wild Cam (17)), and contextual bias (Waterbirds (32)) datasets.
Dataset Splits	Yes	For each method, we do a hyperparameter sweep across learning rate and weight decay and choose the parameters with the highest validation performance.
Hardware Specification	Yes	We train on 10 Ge Force RTX 2080 Ti GPUs.
Software Dependencies	Yes	We use the Py Torch pretrained models (27) on Image Net with an Adam optimizer (16) and cosine learning rate scheduler. For all the diffusion-based editing methods, we use Stable Diffusion version 1.5 (31) from Hugging Face (40) with the default hyperparameters aside from the edit strength (how much to deviate from the original image) and the text guidance (how closely the generated image should align with the text prompt). We use BLIP (19) captioning model and GPT-4 (25) for summarizing the captions.
Experiment Setup	Yes	For each method, we do a hyperparameter sweep across learning rate and weight decay and choose the parameters with the highest validation performance. For all the diffusion-based editing methods, we use Stable Diffusion version 1.5 (31) from Hugging Face (40) with the default hyperparameters aside from the edit strength (how much to deviate from the original image) and the text guidance (how closely the generated image should align with the text prompt). For these parameters, we search over 5 different values each for edit strength and text guidance, visualizing the resulting generations for a random sample (10 images) across 4 random seeds (40 generated images in total). Results are averaged over 3 random seeds and further details on the hyperparameter search space and final choice of hyperparameters are listed in the Appendix.