Fairness for Image Generation with Uncertain Sensitive Attributes

Authors: Ajil Jalal, Sushrut Karmalkar, Jessica Hoffmann, Alex Dimakis, Eric Price

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments validate our theoretical results and achieve fair image reconstruction using state-of-the-art generative models. We implement Posterior Sampling via Langevin dynamics, study its empirical performance and compare it to PULSE with respect to our defined metrics. We do this on the MNIST (Le Cun, 1998), Flickr Faces-HQ (Karras et al., 2019) and AFHQ cat & dog (Choi et al., 2020b) datasets.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, The University of Texas at Austin 2Department of Computer Science, The University of Texas at Austin.
Pseudocode No The paper describes algorithms (Posterior Sampling, Langevin dynamics) textually and with equations, but does not provide a formally structured pseudocode block or algorithm box.
Open Source Code Yes Our code and models are available at: https://github.com/ajiljalal/ code-cs-fairness.
Open Datasets Yes We do this on the MNIST (Le Cun, 1998), Flickr Faces-HQ (Karras et al., 2019) and AFHQ cat & dog (Choi et al., 2020b) datasets.
Dataset Splits Yes We trained Style GAN2 (Karras et al., 2020a) on the AFHQ cat & dog (Choi et al., 2020b) training set. ... for the 20% cat generator, we use 125 images of cats and all 500 images of dogs from the AFHQ dataset. Similarly, for the 80% cat generator, we use 500 images of cats and 125 images of dogs in the test set. ... We use a generator trained on 50% cats and 50% dogs, and study whether Posterior Sampling and PULSE satisfy RDP, SPE, and PR in practice. In this case, we use all images of cats and dogs from the AFHQ validation set.
Hardware Specification No The paper does not explicitly describe the hardware used for running the experiments beyond mentioning general computing resources like TACC.
Software Dependencies No The paper mentions software components like "Style GAN2", "NCSNv2", "VAE", "CLIP classifier", "Resnet108" but does not specify their version numbers.
Experiment Setup Yes We implement Posterior Sampling via Langevin dynamics, which states that if x0 N(0, c In), (for c appropriately small), then we can sample from p(x|y) by running noisy gradient ascent: xt+1 xt + γt xt log p(xt|y) + p 2γt ξt where ξt N(0, In) is an i.i.d. standard Gaussian drawn at each iteration. ... Please see Appendix D for architecture-specific details. ... We trained a VAE (Kingma & Welling, 2013) on MNIST digits... We trained Style GAN2 (Karras et al., 2020a) on the AFHQ cat & dog (Choi et al., 2020b) training set. In order to study the effect of population bias on PULSE and Posterior Sampling, we trained three models on datasets with varying bias: (1) 20% cats and 80% dogs, (2) 80% cats and 20% dogs, and (3) 50% cats and 50% dogs.