Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

Authors: Eyal Michaeli, Ohad Fried

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments and benchmark Sa SPA against both traditional and recent generative data augmentation methods. Sa SPA consistently outperforms all established baselines across multiple settings, including full dataset training, contextual bias, and few-shot classification.
Researcher Affiliation Academia Eyal Michaeli Department of Computer Science Reichman University eyal.michaeli@post.runi.ac.il Ohad Fried Department of Computer Science Reichman University ofried@runi.ac.il
Pseudocode No The paper describes the pipeline in text and with a diagram (Figure 2), but does not provide structured pseudocode or an algorithm block.
Open Source Code Yes We release our source code.
Open Datasets Yes We evaluate on five FGVC datasets, using the full datasets for training. We use Aircraft [30], Stanford Cars [24], CUB [58], DTD [9], and Comp Cars [64].
Dataset Splits Yes For datasets lacking a predefined validation split, we establish one. For Comp Cars, we utilize the exterior car parts split, focusing exclusively on classifying images of car components: head light, tail light, fog light, and front into the correct car type. Further details on the exact splits are provided in Appendix C. ... For datasets lacking a validation split (Cars, CUB, Comp Cars), we generate one by using 33% of the training set.
Hardware Specification Yes We use four NVIDIA Ge Force RTX 3090 GPUs for image generation and training. ... Training with Res Net50 necessitates up to 5.5 GB of GPU RAM.
Software Dependencies No All generative methods use the Diffusers library [57]. We employ BLIP-diffusion [27] and Control Net [68] for Sa SPA. ... Stable Diffusion v1.5 [46]. ... PyTorch [39]. The paper mentions software names but does not specify their version numbers (e.g., PyTorch 1.x).
Experiment Setup Yes Optimization is performed with an SGD optimizer, with a momentum of 0.9 and a weight decay of 10-5, over 140 epochs. We adjust the learning rate and batch size during hyper-parameter tuning to achieve the highest validation accuracy. Training images are resized to 224x224 pixels. Results are averaged across three seeds. Specific values of hyper-parameters are in Table 20. ... To select the hyper-parameters for each dataset, we train CAL [44] with learning rates of [0.00001, 0.0001, 0.001, 0.01, 0.1] and batch size [4, 8, 16, 32], selecting the configuration that results in the highest validation accuracy.