reproducibilityindex.ai

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Authors: Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate the effectiveness of our synthetic data across diverse image classification tasks, both as a replacement for and augmentation to real datasets
Researcher Affiliation	Collaboration	University of Oxford ETH Zurich Beijing Academy of Artificial Intelligence
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code released at: https://github.com/BAAI-DCAI/Training-Data-Synthesis.
Open Datasets	Yes	Datasets. We conduct benchmark experiments with Res Net50 (He et al., 2016) across three Image Net datasets: Image Nette (IN-10) (Howard, 2019), Image Net100 (IN-100) (Tian et al., 2020), and Image Net1K (IN-1K) (Deng et al., 2009). Beyond these, we also experiment with several fine-grained image classification datasets, CUB (Wah et al., 2011), Cars (Krause et al., 2013), PET (Parkhi et al., 2012), and satellite images, Euro SAT (Helber et al., 2018).
Dataset Splits	No	The paper provides 'Training Data Size' and 'Test Data Size' in Table 5 for each dataset, but does not explicitly detail a separate 'validation' split size or the specific methodology for all three splits.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies	Yes	We finetune Stable Diffusion 1.5 (SDv1.5) (Rombach et al., 2022) with Lo RA.
Experiment Setup	Yes	The stable diffusion generation parameters are specified in Tab. 6. We use text prompt mentioned in Sec. 3.2... The fine-tuning hyperparameters used are specified in Tab. 7. ...the training hyperparameters are specified in Tab. 8.