Confidential-PROFITT: Confidential PROof of FaIr Training of Trees
Authors: Ali Shahin Shamsabadi, Sierra Calanda Wyllie, Nicholas Franzese, Natalie Dullerud, Sébastien Gambs, Nicolas Papernot, Xiao Wang, Adrian Weller
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that bounding the information gain of each node with respect to the sensitive attributes reduces the unfairness of the final tree. In extensive experiments on the COMPAS, Communities and Crime, Default Credit, and Adult datasets, we demonstrate that a company can use Confidential-PROFITT to certify the fairness of their decision tree to an auditor in less than 2 minutes, thus indicating the applicability of our approach. |
| Researcher Affiliation | Collaboration | Ali Shahin Shamsabadi1, Sierra Wyllie2, 3, Nicholas Franzese4, Natalie Dullerud2, 3 Sébastien Gambs *5, Nicolas Papernot*2, 3, Xiao Wang*4, Adrian Weller*1, 6 1 The Alan Turing Institute, 2 University of Toronto, 3 Vector Institute, 4 Northwestern University, 5 Université du Québec à Montréal, 6 University of Cambridge |
| Pseudocode | Yes | Algorithm 1: Finding the best split for each node using our fair learning algorithm. Algorithm 2: ZK proof of demographic parity fair tree training. For equalized odds fair tree training see Appendix F. Algorithm 3: Recursively building decision tree. Algorithm 4: Finding the best (fairness-oblivious) split for each node. Algorithm 5: Zero-knowledge proof of equalized odds-aware tree training. Algorithm 6: ZK proof of fair training of a random forest. |
| Open Source Code | Yes | The code is available at https://github.com/cleverhanslab/Confidential-PROFITT. |
| Open Datasets | Yes | We assess the performance of Confidential-PROFITT using four common datasets for fairness benchmarking: COMPAS (Angwin et al., 2016), Communities and Crime (Redmond, 2009), Adult Income (Adu, 1996), and Default Credit (Def, 2016). |
| Dataset Splits | Yes | We evaluate fairness and accuracy using Fairlearn (Bird et al., 2020) and Sci Py (Virtanen et al., 2020) over a testing set using a test-train split of 75% : 25%. |
| Hardware Specification | Yes | We use EMP-toolkit (Wang et al., 2016) to implement our ZK protocol. EMP is written in C++ and offers efficient implementations of ZK protocols. This code base is used for timing results (benchmarking the efficiency of our ZK protocol) and conducted using two Amazon EC2 c6a.2xlarge machines to represent the prover and verifier. |
| Software Dependencies | No | The paper mentions "EMP-toolkit (Wang et al., 2016)", "JSAT (Raff, 2017)", "Fairlearn (Bird et al., 2020)", and "Sci Py (Virtanen et al., 2020)" but only specifies the Java version for JSAT ("Java (v. 14.0.2)"). It does not provide specific versions for other libraries. |
| Experiment Setup | Yes | To evaluate Confidential-PROFITT, we train decision trees and random forests for 250 values of τ with 10 random seeds each. For each dataset, we set the height of the tree by observing test and training set results in a decision tree trained without fairness. We choose the smallest height that maintains accuracy without overfitting. These heights are reported for each dataset in Table 4. |