Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training

Authors: Shangshu Qian, Viet Hung Pham, Thibaud Lutellier, Zeou Hu, Jungwon Kim, Lin Tan, Yaoliang Yu, Jiahao Chen, Sameena Shah

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we conduct the first empirical study to quantify the impact of software implementation on the fairness and its variance of DL systems. Our study of 22 mitigation techniques and five baselines reveals up to 12.6% fairness variance across identical training runs with identical seeds.
Researcher Affiliation Collaboration Shangshu Qian Purdue University West Lafayette, IN, USA shangshu@purdue.edu Hung Viet Pham University of Waterloo Vector Institute hvpham@uwaterloo.ca ... Jiahao Chen J. P. Morgan AI Research New York, NY, USA jiahao@getparity.ai Sameena Shah J. P. Morgan AI Research New York, NY, USA sameena.shah@jpmorgan.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Data and code availability: Experiment data and artifact for reproducibility study are available in a public Git Hub repository1. 1https://github.com/lin-tan/fairness-variance/
Open Datasets Yes The experiments are performed on four popular datasets (Celeb A, MS-COCO, im Situ, and CIFAR-10S) with three DL networks (Res Net-18, Res Net-50, and NIFR [47]), measured by seven popular bias metrics (Section 3).
Dataset Splits No For each technique, all the training runs are executed with the same training data (also the original training/test split), hyper-parameters, and optimizers. The paper mentions an “original training/test split” but does not explicitly provide percentages, sample counts, or clear details for a separate validation split in the main text.
Hardware Specification Yes Details of the hardware and software environment are in Appendix B.4.
Software Dependencies Yes Details of the hardware and software environment are in Appendix B.4.
Experiment Setup Yes For each technique, all the training runs are executed with the same training data (also the original training/test split), hyper-parameters, and optimizers. With the fixed seed, all training runs also have the same order of data and the same initial weights. We perform 16 FIT runs with the same random seed for each technique, and then evaluate the fairness of the trained models using seven bias metrics.