Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Authors: Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate the effectiveness of our synthetic data across diverse image classification tasks, both as a replacement for and augmentation to real datasets
Researcher Affiliation Collaboration University of Oxford ETH Zurich Beijing Academy of Artificial Intelligence
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code released at: https://github.com/BAAI-DCAI/Training-Data-Synthesis.
Open Datasets Yes Datasets. We conduct benchmark experiments with Res Net50 (He et al., 2016) across three Image Net datasets: Image Nette (IN-10) (Howard, 2019), Image Net100 (IN-100) (Tian et al., 2020), and Image Net1K (IN-1K) (Deng et al., 2009). Beyond these, we also experiment with several fine-grained image classification datasets, CUB (Wah et al., 2011), Cars (Krause et al., 2013), PET (Parkhi et al., 2012), and satellite images, Euro SAT (Helber et al., 2018).
Dataset Splits No The paper provides 'Training Data Size' and 'Test Data Size' in Table 5 for each dataset, but does not explicitly detail a separate 'validation' split size or the specific methodology for all three splits.
Hardware Specification No No specific hardware details (like GPU/CPU models or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies Yes We finetune Stable Diffusion 1.5 (SDv1.5) (Rombach et al., 2022) with Lo RA.
Experiment Setup Yes The stable diffusion generation parameters are specified in Tab. 6. We use text prompt mentioned in Sec. 3.2... The fine-tuning hyperparameters used are specified in Tab. 7. ...the training hyperparameters are specified in Tab. 8.