reproducibilityindex.ai

Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

Authors: Samuel Dooley, Rhea Sukthanker, John Dickerson, Colin White, Frank Hutter, Micah Goldblum

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train a diverse set of 29 architectures... on the two most widely used datasets... Celeb A and VGGFace2. In doing so, we discover that architectures and hyperparameters have a significant impact on fairness... We discover a Pareto frontier of face recognition models that outperform existing state-of-the-art models on both test accuracy and multiple fairness metrics, often by large margins.We conducted one NAS+HPO search for each dataset by searching on the train and validation sets. After running these searches, we identified three new candidate architectures... We then retrained each of these models and those high performing models from Section 3 for three seeds to study the robustness of error and disparity for these models; we evaluated their performance on the validation and test sets for each dataset
Researcher Affiliation	Collaboration	Samuel Dooley University of Maryland, Abacus.AI samuel@abacus.ai; Rhea Sanjay Sukthanker University of Freiburg sukthank@cs.uni-freiburg.de; John P. Dickerson University of Maryland, Arthur AI johnd@umd.edu; Colin White Caltech, Abacus.AI crwhite@caltech.edu; Frank Hutter University of Freiburg fh@cs.uni-freiburg.de; Micah Goldblum New York University goldblum@nyu.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. Methods are described in prose.
Open Source Code	Yes	We release our code, models and raw data files at https://github.com/dooleys/FR-NAS.We release our code and raw results at https://github.com/dooleys/FR-NAS, so that users can easily adapt our approach to any bias metric or dataset.
Open Datasets	Yes	We train and evaluate each model configuration on a gender-balanced subset of the two most popular face identification datasets: Celeb A and VGGFace2. Celeb A [69] is a large-scale face attributes dataset... VGGFace2 [8] is a much larger dataset...
Dataset Splits	Yes	We split each dataset into train, validation, and test sets. We conduct our search for novel architectures using the train and validation splits, and then show the improvement of our model on the test set.
Hardware Specification	Yes	A cumulative of 88 493 hours of computation was performed on hardware of type RTX 2080 Ti (TDP of 250W).All model configurations are trained with a total batch size of 64 on 8 RTX2080 GPUS for 100 epochs each.
Software Dependencies	No	The paper mentions software tools like "Pytorch Image Model (timm)", "SMAC3", "Hyperband", "Par EGO", and "syne-tune library" but does not specify their version numbers.
Experiment Setup	Yes	For each model, we use the default learning rate and optimizer that was published with that model. We then train the model with these hyperparameters for each of three heads, Arc Face [108], Cos Face [23], and Mag Face [75]. Next, we use the model s default learning rate with both Adam W [70] and SGD optimizers (again with each head choice). Finally, we also train with Adam W and SGD with unified learning rates (SGD with learning_rate=0.1 and Adam W with learning_rate=0.001).All model configurations are trained with a total batch size of 64 on 8 RTX2080 GPUS for 100 epochs each.