reproducibilityindex.ai

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators

Authors: Jianhao Yuan, Francesco Pinto, Adam Davies, Philip Torr

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment across a diverse collection of benchmarks in single domain generalization (SDG) and reducing reliance on spurious features (RRSF), ablating across key dimensions of T2I generation including interventional prompting strategies, conditioning mechanisms, and post-hoc filtering. Our extensive empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism, outperforming previously state-of-the-art data augmentation techniques regardless of how each dimension is configured.
Researcher Affiliation	Academia	1University of Oxford 2University of Illinois Urbana-Champaign.
Pseudocode	Yes	Algorithm 1 Augmentation Algorithm
Open Source Code	Yes	code available at: https://github.com/ Yuan Jianhao508/Not Just Pretty Pictures
Open Datasets	Yes	For all experiments, we use Stable Diffusion v1.5 (Rombach et al., 2021) pre-trained on LAION-Aesthetics.2 LAION-Aesthetics is a subset of the LAION-5B dataset (Schuhmann et al., 2022a) consisting of web-scraped text-image pairs with high aesthetic scores (Schuhmann et al., 2022b).
Dataset Splits	No	The paper mentions `No.validation` in Table 23 for dataset statistics but does not provide specific details on how the validation set is used for splits or hyperparameter tuning that would be sufficient for reproduction.
Hardware Specification	Yes	The general statistics of computational expense of each type of generative model on an NVIDIA A40 GPU and generator with hyperparameters specified for Office Home experiment are as follows:
Software Dependencies	No	The paper mentions specific models (e.g., Stable Diffusion v1.5, ResNet-18, ResNet-50) and frameworks (e.g., CLIP, T5) but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	The training hyperparameters for setting are specified as shown in Tab. 19. Generator Hyperparameter For the two types of generative models, we use the hyperparameters for each dataset as shown in Tab. 20.