Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

Authors: Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that SGID outperforms the best augmentation baseline by 1.72% on Res Net-50 (from scratch), 0.33% on Vi T (Image Net-21k), and 0.14% on CLIP-Vi T (LAION-2B).
Researcher Affiliation Academia Harbin Institute of Technology, Harbin, China {bhli, xxu, xhwang, ythou, ylfeng}@ir.hit.edu.cn, {7203610216, 120l020412}@stu.hit.edu.cn, {qfzhu, car}@ir.hit.edu.cn
Pseudocode No The paper describes the steps of the proposed method (SGID) using text and a diagram (Figure 2), but does not include structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide an explicit statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets Yes Datasets: We evaluate the effectiveness of our proposed method on seven commonly used datasets, including three coarse-grained object classification datasets: CIFAR-10, CIFAR-100 (Krizhevsky 2009), Caltech101 (Cal101) (Fei Fei, Fergus, and Perona 2004), and four fine-grained object classification datasets: Stanford Cars (Cars) (Krause et al. 2013), Flowers102 (Flowers) (Nilsback and Zisserman 2008), Oxford Pets (Pets) (Parkhi et al. 2012) and texture classification DTD (Cimpoi et al. 2014).
Dataset Splits No The paper mentions using a 'validation set' to choose sampling strategies ('We choose one of the sampling strategies based on the results of the validation set.') but does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly detail the splitting methodology for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like BLIP, CLIP, and Stable Diffusion, but it does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup Yes We use p = 0.9 by default in nucleus sampling and num beams = 3 by default in beam search. The default CLIP-Vi T-B/32 model is used for calculating image-text similarity. We apply the pre-trained stable-diffusion-v1-5 model and generate one augmented image for each original image. Empirically, we take f(s ) = 4 (s )2 + 2 s +1 as the guidance mapping function. We select the noise rate n from {0.3, 0.5, 0.7}. As for prompt weighting, We assign a weight of 1.50 to the labels... and a weight of 0.90 to the caption... We run each method over 5 different random seeds.