reproducibilityindex.ai

Fair Generative Models via Transfer Learning

Authors: Christopher T.H. Teo, Milad Abdollahzadeh, Ngai-Man Cheung

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that fair TL and fair TL++ achieve state-of-the-art in both quality and fairness of generated samples. The code and additional resources can be found at bearwithchris.github.io/fair TL/
Researcher Affiliation	Academia	Singapore University of Technology and Design (SUTD) christopher teo@mymail.sutd.edu.sg, {milad abdollahzadeh, ngaiman cheung}@sutd.edu.sg
Pseudocode	No	The paper describes the methods verbally and with equations (e.g., Eqn. 1, Eqn. 2) but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	The code and additional resources can be found at bearwithchris.github.io/fair TL/
Open Datasets	Yes	Dataset. We consider the datasets Celeb A (Liu et al. 2015) and UTKFace (Zhang, Song, and Qi 2017) for this experiment.
Dataset Splits	No	The paper discusses the ratios of Dref to Dbias (e.g., 'perc = {0.25, 0.1, 0.05, 0.025}') but does not explicitly state conventional train/validation/test dataset splits in percentages or specific counts for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies	No	The paper mentions using BIGGAN and StyleGAN2 but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Eqn. 2 presents the loss function, where we utilize λ [0, 1] as a hyper-parameter to control the balance between enforcing fairness and quality. In our experiments, we found that although both discriminators play an essential part in improving the performance of the GAN, more emphasis should be placed on Dt. In particular, since Ds is frozen, making λ too small results in instability during training. Conversely, making λ too big limits the feedback we get on the sample s quality. Empirically, we found λ = 0.6 to be ideal.