FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods

Authors: Xiaotian Han, Jianfeng Chi, Yu Chen, Qifan Wang, Han Zhao, Na Zou, Xia Hu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work offers the following key contributions: ... extensive benchmarking, which yields key insights from 45, 079 experiments, 14, 428 GPU hours.
Researcher Affiliation Collaboration 1Texas A&M University 2Meta AI 3Anytime AI 4UIUC 5University of Houston 6Rice University
Pseudocode Yes Algorithm 1 Adv Debias in AIF360, Algorithm 2 Adv Debias in Fair Learn, Algorithm 3 Adv Debias in FFB
Open Source Code Yes The benchmark is available at https://github.com/ahxt/fair_fairness_benchmark.
Open Datasets Yes Adult (Kohavi & Becker, 1996)... German (Dua & Graff, 2017)... KDDCensus (Dua & Graff, 2017)... COMPAS (Larson et al., 2016)... Bank (Dua & Graff, 2017)... ACS-I/E/P/M/T (Ding et al., 2021)... Celeb A-A/W/S (Liu et al., 2015)... UTKFace (Zhang et al., 2017)... Jigsaw (Jigsaw, 2018)... The dataset loading codes are at this url.
Dataset Splits Yes We also split the data into training and test sets with random seeds. We use the training set to train the model and the test set to evaluate the model s performance. ... The results are based on 10 trials with varying data splits and training seeds, to ensure reliable outcomes.
Hardware Specification No The paper mentions '14, 428 GPU hours' but does not specify any particular GPU models, CPU models, or other hardware specifications used for the experiments.
Software Dependencies No The paper mentions various software components like 'AIF360 (Bellamy et al., 2018)', 'Fair Learn (Bird et al., 2020)', 'Scikit-learn (Pedregosa et al., 2011)', 'Pytorch-style (Paszke et al., 2019)', and 'Adam (Diederik P. Kingma, 2014)', but no specific version numbers are provided for any of these.
Experiment Setup Yes For tabular datasets, we use a two-layer Multi-layer Perceptron with 256 neurons each for all datasets. We use Adam (Diederik P. Kingma, 2014) as the optimizer with a learning rate of 0.001 for both tabular and image data. ... We employ a linear decay strategy for the learning rate, halving it every 50 training steps. The model training is stopped when the learning rate decreases to a value below 1e 5. Table 5: Common Hyper-parameters. Table 6: The fairness control hyperparameter selections. Table 7: The batch size for different datasets during the training.