reproducibilityindex.ai

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

Authors: Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that SGID outperforms the best augmentation baseline by 1.72% on Res Net-50 (from scratch), 0.33% on Vi T (Image Net-21k), and 0.14% on CLIP-Vi T (LAION-2B).
Researcher Affiliation	Academia	Harbin Institute of Technology, Harbin, China {bhli, xxu, xhwang, ythou, ylfeng}@ir.hit.edu.cn, {7203610216, 120l020412}@stu.hit.edu.cn, {qfzhu, car}@ir.hit.edu.cn
Pseudocode	No	The paper describes the steps of the proposed method (SGID) using text and a diagram (Figure 2), but does not include structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Datasets: We evaluate the effectiveness of our proposed method on seven commonly used datasets, including three coarse-grained object classiﬁcation datasets: CIFAR-10, CIFAR-100 (Krizhevsky 2009), Caltech101 (Cal101) (Fei Fei, Fergus, and Perona 2004), and four ﬁne-grained object classiﬁcation datasets: Stanford Cars (Cars) (Krause et al. 2013), Flowers102 (Flowers) (Nilsback and Zisserman 2008), Oxford Pets (Pets) (Parkhi et al. 2012) and texture classiﬁcation DTD (Cimpoi et al. 2014).
Dataset Splits	No	The paper mentions using a 'validation set' to choose sampling strategies ('We choose one of the sampling strategies based on the results of the validation set.') but does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly detail the splitting methodology for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like BLIP, CLIP, and Stable Diffusion, but it does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup	Yes	We use p = 0.9 by default in nucleus sampling and num beams = 3 by default in beam search. The default CLIP-Vi T-B/32 model is used for calculating image-text similarity. We apply the pre-trained stable-diffusion-v1-5 model and generate one augmented image for each original image. Empirically, we take f(s ) = 4 (s )2 + 2 s +1 as the guidance mapping function. We select the noise rate n from {0.3, 0.5, 0.7}. As for prompt weighting, We assign a weight of 1.50 to the labels... and a weight of 0.90 to the caption... We run each method over 5 different random seeds.