Quality-Diversity Generative Sampling for Learning with Synthetic Data
Authors: Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matarić, Stefanos Nikolaidis
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. |
| Researcher Affiliation | Academia | 1 University of Southern California, Los Angeles, USA 2 Massachusetts Institute of Technology, Cambridge, USA |
| Pseudocode | Yes | Algorithm 1: Quality-Diversity Generative Sampling |
| Open Source Code | Yes | Code available at: https://github.com/Cylumn/qd-generative-sampling. |
| Open Datasets | Yes | Randomly sampling from Style GAN2 (Karras et al. 2020) reproduces a skin tone imbalance ( 7:1 light1 to dark skin tone ratio) and an age imbalance ( 2:1 young to old ratio) from its training dataset Flickr-Faces-HQ (FFHQ) (Karras, Laine, and Aila 2019). |
| Dataset Splits | No | The paper mentions training and evaluation datasets but does not provide specific percentages, sample counts, or explicit methodology for how data was split into training, validation, and test sets for its experiments. |
| Hardware Specification | No | The authors acknowledge the Center for Advanced Research Computing (CARC) at the University of Southern California for providing computing resources that have contributed to the research results reported within this publication. URL: https://carc.usc.edu. However, specific hardware details like GPU/CPU models or memory are not provided. |
| Software Dependencies | No | The paper mentions various models and loss functions such as Res Net101, Ada Face loss, Arc Face loss, and CLIP, but it does not provide specific version numbers for any underlying software libraries or dependencies (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | We select Res Net101 backbones and train with Ada Face loss (Kim, Jain, and Liu 2022). For each synthetic dataset, we pretrain model weights over 26 epochs with a learning rate of 0.01. |