reproducibilityindex.ai

Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training

Authors: Shangshu Qian, Viet Hung Pham, Thibaud Lutellier, Zeou Hu, Jungwon Kim, Lin Tan, Yaoliang Yu, Jiahao Chen, Sameena Shah

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we conduct the ﬁrst empirical study to quantify the impact of software implementation on the fairness and its variance of DL systems. Our study of 22 mitigation techniques and ﬁve baselines reveals up to 12.6% fairness variance across identical training runs with identical seeds.
Researcher Affiliation	Collaboration	Shangshu Qian Purdue University West Lafayette, IN, USA shangshu@purdue.edu Hung Viet Pham University of Waterloo Vector Institute hvpham@uwaterloo.ca ... Jiahao Chen J. P. Morgan AI Research New York, NY, USA jiahao@getparity.ai Sameena Shah J. P. Morgan AI Research New York, NY, USA sameena.shah@jpmorgan.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Data and code availability: Experiment data and artifact for reproducibility study are available in a public Git Hub repository1. 1https://github.com/lin-tan/fairness-variance/
Open Datasets	Yes	The experiments are performed on four popular datasets (Celeb A, MS-COCO, im Situ, and CIFAR-10S) with three DL networks (Res Net-18, Res Net-50, and NIFR [47]), measured by seven popular bias metrics (Section 3).
Dataset Splits	No	For each technique, all the training runs are executed with the same training data (also the original training/test split), hyper-parameters, and optimizers. The paper mentions an “original training/test split” but does not explicitly provide percentages, sample counts, or clear details for a separate validation split in the main text.
Hardware Specification	Yes	Details of the hardware and software environment are in Appendix B.4.
Software Dependencies	Yes	Details of the hardware and software environment are in Appendix B.4.
Experiment Setup	Yes	For each technique, all the training runs are executed with the same training data (also the original training/test split), hyper-parameters, and optimizers. With the ﬁxed seed, all training runs also have the same order of data and the same initial weights. We perform 16 FIT runs with the same random seed for each technique, and then evaluate the fairness of the trained models using seven bias metrics.