FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods
Authors: Xiaotian Han, Jianfeng Chi, Yu Chen, Qifan Wang, Han Zhao, Na Zou, Xia Hu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work offers the following key contributions: ... extensive benchmarking, which yields key insights from 45, 079 experiments, 14, 428 GPU hours. |
| Researcher Affiliation | Collaboration | 1Texas A&M University 2Meta AI 3Anytime AI 4UIUC 5University of Houston 6Rice University |
| Pseudocode | Yes | Algorithm 1 Adv Debias in AIF360, Algorithm 2 Adv Debias in Fair Learn, Algorithm 3 Adv Debias in FFB |
| Open Source Code | Yes | The benchmark is available at https://github.com/ahxt/fair_fairness_benchmark. |
| Open Datasets | Yes | Adult (Kohavi & Becker, 1996)... German (Dua & Graff, 2017)... KDDCensus (Dua & Graff, 2017)... COMPAS (Larson et al., 2016)... Bank (Dua & Graff, 2017)... ACS-I/E/P/M/T (Ding et al., 2021)... Celeb A-A/W/S (Liu et al., 2015)... UTKFace (Zhang et al., 2017)... Jigsaw (Jigsaw, 2018)... The dataset loading codes are at this url. |
| Dataset Splits | Yes | We also split the data into training and test sets with random seeds. We use the training set to train the model and the test set to evaluate the model s performance. ... The results are based on 10 trials with varying data splits and training seeds, to ensure reliable outcomes. |
| Hardware Specification | No | The paper mentions '14, 428 GPU hours' but does not specify any particular GPU models, CPU models, or other hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions various software components like 'AIF360 (Bellamy et al., 2018)', 'Fair Learn (Bird et al., 2020)', 'Scikit-learn (Pedregosa et al., 2011)', 'Pytorch-style (Paszke et al., 2019)', and 'Adam (Diederik P. Kingma, 2014)', but no specific version numbers are provided for any of these. |
| Experiment Setup | Yes | For tabular datasets, we use a two-layer Multi-layer Perceptron with 256 neurons each for all datasets. We use Adam (Diederik P. Kingma, 2014) as the optimizer with a learning rate of 0.001 for both tabular and image data. ... We employ a linear decay strategy for the learning rate, halving it every 50 training steps. The model training is stopped when the learning rate decreases to a value below 1e 5. Table 5: Common Hyper-parameters. Table 6: The fairness control hyperparameter selections. Table 7: The batch size for different datasets during the training. |