Individual Fairness Guarantees for Neural Networks

Authors: Elias Benussi, Andrea Patane', Matthew Wicker, Luca Laurenti, Marta Kwiatkowska

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our method on four benchmarks widely employed in the fairness literature, namely, the Adult, German, Credit and Crime datasets1, and an array of similarity metrics learnt from data that include ℓ , Mahalanobis, and NN embeddings. We empirically demonstrate how our method is able to provide the first, non-trivial IF certificates for NNs commonly employed for tasks from the IF literature, and even larger NNs comprising up to thousands of neurons. Furthermore, we find that our MILP-based fair training approach consistently outperforms, in terms of IF guarantees, NNs trained with a competitive state-of-the-art technique by orders of magnitude, albeit at an increased computational cost.
Researcher Affiliation Academia Elias Benussi1 , Andrea Patane1 , Matthew Wicker1 , Luca Laurenti2 and Marta Kwiatkowska1 1University of Oxford 2TU Delft
Pseudocode Yes We summarise our fairness training method in Algorithm 1. [...] Algorithm 1 Fair Training with MILP.
Open Source Code Yes An implementation of the method and of the experiments can be found at https://github.com/eliasbenussi/nn-cert-individual-fairness.
Open Datasets Yes We perform our experiments on four UCI datasets: the Adult dataset (predicting income), the Credit dataset (predicting payment defaults), the German dataset (predicting credit risk) and the Crime dataset. In each case, features encoding information regarding gender or race are considered sensitive. [...] 1http://archive.ics.uci.edu/ml
Dataset Splits No The paper mentions using training data and mini-batches, but does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., percentages, sample counts, cross-validation setup).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact CPU/GPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions 'increased computational cost' and training times.
Software Dependencies No The paper mentions using 'standard solvers from the global optimisation literature' and MILP solvers, but does not specify any particular software, libraries, or their version numbers (e.g., 'CPLEX 12.4', 'Python 3.8', 'PyTorch 1.9') that would be required for replication.
Experiment Setup Yes In the certification experiments we employ a precision τ for the MILP solvers of 10^-5 and a time cutoff of 180 seconds. The choice of λ affects the relative importance of standard training w.r.t. the fairness constraint: λ = 1 is equivalent to standard training, while λ = 0 only optimises for fairness. In our experiments we keep λ = 1 for half of the training epochs, and then change it to λ = 0.5. For ease of comparison, in the rest of this section we measure fairness with dfair equal to the Mahalanobis similarity metric, with ϵ = 0.2. We train architectures with up to 2 hidden layers and 64 units.