Quality-Diversity Generative Sampling for Learning with Synthetic Data

Authors: Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matarić, Stefanos Nikolaidis

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks.
Researcher Affiliation Academia 1 University of Southern California, Los Angeles, USA 2 Massachusetts Institute of Technology, Cambridge, USA
Pseudocode Yes Algorithm 1: Quality-Diversity Generative Sampling
Open Source Code Yes Code available at: https://github.com/Cylumn/qd-generative-sampling.
Open Datasets Yes Randomly sampling from Style GAN2 (Karras et al. 2020) reproduces a skin tone imbalance ( 7:1 light1 to dark skin tone ratio) and an age imbalance ( 2:1 young to old ratio) from its training dataset Flickr-Faces-HQ (FFHQ) (Karras, Laine, and Aila 2019).
Dataset Splits No The paper mentions training and evaluation datasets but does not provide specific percentages, sample counts, or explicit methodology for how data was split into training, validation, and test sets for its experiments.
Hardware Specification No The authors acknowledge the Center for Advanced Research Computing (CARC) at the University of Southern California for providing computing resources that have contributed to the research results reported within this publication. URL: https://carc.usc.edu. However, specific hardware details like GPU/CPU models or memory are not provided.
Software Dependencies No The paper mentions various models and loss functions such as Res Net101, Ada Face loss, Arc Face loss, and CLIP, but it does not provide specific version numbers for any underlying software libraries or dependencies (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes We select Res Net101 backbones and train with Ada Face loss (Kim, Jain, and Liu 2022). For each synthetic dataset, we pretrain model weights over 26 epochs with a learning rate of 0.01.