reproducibilityindex.ai

Quality-Diversity Generative Sampling for Learning with Synthetic Data

Authors: Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matarić, Stefanos Nikolaidis

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks.
Researcher Affiliation	Academia	1 University of Southern California, Los Angeles, USA 2 Massachusetts Institute of Technology, Cambridge, USA
Pseudocode	Yes	Algorithm 1: Quality-Diversity Generative Sampling
Open Source Code	Yes	Code available at: https://github.com/Cylumn/qd-generative-sampling.
Open Datasets	Yes	Randomly sampling from Style GAN2 (Karras et al. 2020) reproduces a skin tone imbalance ( 7:1 light1 to dark skin tone ratio) and an age imbalance ( 2:1 young to old ratio) from its training dataset Flickr-Faces-HQ (FFHQ) (Karras, Laine, and Aila 2019).
Dataset Splits	No	The paper mentions training and evaluation datasets but does not provide specific percentages, sample counts, or explicit methodology for how data was split into training, validation, and test sets for its experiments.
Hardware Specification	No	The authors acknowledge the Center for Advanced Research Computing (CARC) at the University of Southern California for providing computing resources that have contributed to the research results reported within this publication. URL: https://carc.usc.edu. However, specific hardware details like GPU/CPU models or memory are not provided.
Software Dependencies	No	The paper mentions various models and loss functions such as Res Net101, Ada Face loss, Arc Face loss, and CLIP, but it does not provide specific version numbers for any underlying software libraries or dependencies (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup	Yes	We select Res Net101 backbones and train with Ada Face loss (Kim, Jain, and Liu 2022). For each synthetic dataset, we pretrain model weights over 26 epochs with a learning rate of 0.01.