Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification
Authors: Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that SGID outperforms the best augmentation baseline by 1.72% on Res Net-50 (from scratch), 0.33% on Vi T (Image Net-21k), and 0.14% on CLIP-Vi T (LAION-2B). |
| Researcher Affiliation | Academia | Harbin Institute of Technology, Harbin, China {bhli, xxu, xhwang, ythou, ylfeng}@ir.hit.edu.cn, {7203610216, 120l020412}@stu.hit.edu.cn, {qfzhu, car}@ir.hit.edu.cn |
| Pseudocode | No | The paper describes the steps of the proposed method (SGID) using text and a diagram (Figure 2), but does not include structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Datasets: We evaluate the effectiveness of our proposed method on seven commonly used datasets, including three coarse-grained object classification datasets: CIFAR-10, CIFAR-100 (Krizhevsky 2009), Caltech101 (Cal101) (Fei Fei, Fergus, and Perona 2004), and four fine-grained object classification datasets: Stanford Cars (Cars) (Krause et al. 2013), Flowers102 (Flowers) (Nilsback and Zisserman 2008), Oxford Pets (Pets) (Parkhi et al. 2012) and texture classification DTD (Cimpoi et al. 2014). |
| Dataset Splits | No | The paper mentions using a 'validation set' to choose sampling strategies ('We choose one of the sampling strategies based on the results of the validation set.') but does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly detail the splitting methodology for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like BLIP, CLIP, and Stable Diffusion, but it does not provide specific version numbers for these or any other ancillary software dependencies. |
| Experiment Setup | Yes | We use p = 0.9 by default in nucleus sampling and num beams = 3 by default in beam search. The default CLIP-Vi T-B/32 model is used for calculating image-text similarity. We apply the pre-trained stable-diffusion-v1-5 model and generate one augmented image for each original image. Empirically, we take f(s ) = 4 (s )2 + 2 s +1 as the guidance mapping function. We select the noise rate n from {0.3, 0.5, 0.7}. As for prompt weighting, We assign a weight of 1.50 to the labels... and a weight of 0.90 to the caption... We run each method over 5 different random seeds. |