Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Salient Concept-Aware Generative Data Augmentation

Authors: Tianchen Zhao, Xuanbai Chen, Zhihua Li, Jun Fang, DONGSHENG An, Xiang Xu, Zhuowen Tu, Yifan Xing

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach demonstrates superior performance across eight fine-grained vision datasets, outperforming state-of-the-art augmentation methods with averaged classification accuracy improvements by 0.73% and 6.5% under conventional and long-tail settings, respectively. Experiments on eight Fine-Grained Visual Categorization (FGVC) datasets demonstrate state-of-the-art performance over existing generative data augmentation methods. We introduce the experimental settings in Section 4.1. We compare our method with GDA baselines in Section 4.2. Ablation studies and qualitative analysis are shown in Section 4.3.
Researcher Affiliation	Academia	The provided text snippet only lists the authors' names (Tianchen Zhao, Xuanbai Chen, Zhihua Li, Jun Fang, Dongsheng An, Xiang Xu, Zhuowen Tu, Yifan Xing) without any explicit institutional affiliations or email addresses, making it impossible to accurately classify their affiliation type. As NeurIPS is an academic conference, academia is chosen as a placeholder.
Pseudocode	No	The paper describes the proposed framework and approach in Section 3, including components like the Salient Concept-Aware Embedding Model and Synthesis Model, and provides an architectural diagram in Figure 3. However, it does not contain any explicitly labeled 'Pseudocode', 'Algorithm', or 'Algorithm X' block or structured code-like procedure.
Open Source Code	No	The NeurIPS Paper Checklist explicitly states: 'Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Legal constraints.'
Open Datasets	Yes	We conduct experiments on widely used Fine-Grained Visual Classification (FGVC) datasets: Image Net-1K [17] (IN), i Naturalist2018 [77] (i Nat), Flower102 [56], Aircraft100 [51], CUB200-2011 [79], Cars102 [43], Stanford Dogs120 [41] and Food101 [9]. Additionally, it mentions "Places365 [101]".
Dataset Splits	Yes	For the inference data used to train the downstream classifier, we prompt Claude-3-Sonnet [2] as a separate large language model to produce contextualized descriptions of images featuring a specific class. The synthetic dataset pool consists of approximately K=5 times the number of original training samples. For example, for a category with n real samples, we generate 5n text prompts from LLM, and then produce the same amount of synthetic images of the same category. Under few-shot and long-tail settings, we ensure a more balanced class distribution by generating at least 250 (= 50 K) synthetic samples per class. We adopt an N-way 10-shot setting over three episodes, where N is the number of classes. For each class, we randomly draw 10 samples from the training set, resulting in a total of 10N training data.
Hardware Specification	Yes	Table 7 reports wall-clock training times on an 8 A100 GPU setup across datasets of varying scales. For example, it takes approximately 3 days to converge with 8 A100 GPUs for i Naturalist [77] when training on a set of about 2.7 million images.
Software Dependencies	No	The paper mentions several software components like 'Stable Diffusion XL (SDXL) [60] implemented in Diffusers [78] repository', 'Blip2 [47]', 'Mini GPT-4 [103]', and 'Adam W [50] optimizer'. However, specific version numbers for these software packages or libraries are not provided, which is required for reproducibility.
Experiment Setup	Yes	For embedding model training, we fine-tune the model as defined in Equation 1. For synthesis model training, we only update image projector weights, image attention weights, and Lo RA weights. We adopt classifier-free guidance (CFG) [32] with a strength of 5.0, and randomly apply conditional dropout to the embedding control and text control, with a 5% chance to drop the control from image, text or both, respectively. We use the Adam W [50] optimizer with a constant learning rate of 1e-5, where we update only the image projector weights, image attention weights, and Lo RA weights. We train the adapted model for 10 epochs across all datasets.