Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition

Authors: Parsa Rahimi, Damien Teney, Sébastien Marcel

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated across 8 FR benchmarks, including IJB-C and IJB-B, our method achieves 1 12% performance improvements, outperforming models trained solely on real data and surpassing state-of-the-art synthetic data generation approaches, while using less real data. Notably, these gains often exceed those from architectural enhancements, underscoring the value of synthetic augmentation in data-limited scenarios. Our findings demonstrate that carefully integrated synthetic data can both mitigate privacy constraints and substantially enhance recognition performance.
Researcher Affiliation	Academia	Parsa Rahimi Noshanagh EPFL, Idiap Switzerland EMAIL Damien Teney Idiap Switzerland EMAIL Sebastien Marcel Idiap, UNIL Switzerland EMAIL
Pseudocode	Yes	Algorithm 1: Grid search for α and β Algorithm 2: Generating Daug
Open Source Code	Yes	Additionally, all the code, models and synthetic datasets will be publicly available for reproducibility. All the code, models and synthetic datasets will become publicly available,
Open Datasets	Yes	We evaluate our approach using two real-world datasets, Dorig: CASIA-Web Face [56] and a subset of Web Face4M [61]. The Web Face160K subset, referred in this work to as Web Face160K, was selected to include approximately 10,000 identities (i.e., like CASIA-Web Face), each represented by 11 to 24 samples, resulting in a total of 160K face images. More details about the datasets are presented in the Appendix B. Face Recognition Benchmarks We show that our synthetic augmentation is boosting the performance of a model trained with the real dataset in all of the studied public FR benchmarks. For this purpose, we evaluated against two sets of FR benchmarks. The first set consists of LFW [18], CFPFP [44], CPLFW [58], CALFW [59], Age DB [33], which includes mainly high-quality images with various lighting, poses, and ages the average of these benchmarks presented in Table 1 as Avg-H. The second set involves benchmarks consisting of medium to low-quality images from a realistic and more challenging FR scenario (NIST IJB-B/C) [31, 53] and Tiny Face [7].
Dataset Splits	No	The paper uses established FR benchmarks (LFW, IJB-B/C, etc.) which have their own evaluation protocols, but it does not explicitly state the training/validation/test splits used for the datasets (CASIA-Web Face, Web Face160K) on which their discriminator models were trained. It describes how Web Face160K was created from Web Face4M but not its split for experiments.
Hardware Specification	Yes	The search is computationally efficient, requiring fewer than 2 GPU-days on a single consumer-grade GPU (i.e., RTX 3090 Ti in our case), with 1000 mixes (5 samples/class) per grid point. Table 5: Details of the Discriminator and its Training: GPU Type Nvidia RTX 3090 Ti, Nvidia H100 Table 6: Training times of IR50/IR101 based discriminators on Dorig or Dorig +Daug datasets next to generator s training time: GPU type 1x H100, 4x 3090Ti
Software Dependencies	No	The paper mentions using Python, PyTorch, and CUDA in a general sense, but does not provide specific version numbers for these or any other key software libraries or solvers used in their experiments. For example, it states "We used an Exponential Moving Average (EMA) length of 10%" and refers to "latent diffusion model (LDM)" and "pixel-space diffusion model" but without concrete version details for these frameworks or libraries.
Experiment Setup	Yes	Table 5: Details of the Discriminator and its Training: Parameter Name Discriminator Type 1, Discriminator Type 2, Network type Res Net 50, Marin Loss Ada Face, Batch Size 192, 512, Optimizer Type SGD, Momentum 0.9, Weight Decay 0.0005, Learning Rate 0.1, Warm Up Epoch 1, Number of Epochs 26, LR Scheduler Step, LR Milestones [12, 24, 26], LR Lambda 0.1, Input Dimension 112 112, Input Type RGB images, Output Dimension 512, Seed 41,2048,10 (In some models), 41,2048