Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fair Wrapping for Black-box Predictions
Authors: Alexander Soen, Ibrahim M. Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We exemplify the use of our technique in three fairness notions: conditional valueat-risk, equality of opportunity, and statistical parity; and provide experiments on several readily available datasets. |
| Researcher Affiliation | Collaboration | Alexander Soen Australian National University EMAIL Ibrahim Alabdulmohsin Google Research EMAIL Sanmi Koyejo Google Research Stanford University EMAIL Yishay Mansour Google Research Tel Aviv University EMAIL Nyalleng Moorosi Google Research EMAIL Richard Nock Google Research Australian National University EMAIL Ke Sun Australian National University CSIRO s Data61 EMAIL Lexing Xie Australian National University EMAIL |
| Pseudocode | Yes | Algorithm 1 TOPDOWN (Mt, t, 0, B) |
| Open Source Code | Yes | Implementation public at: https://github.com/alexandersoen/alpha-tree-fair-wrappers |
| Open Datasets | Yes | To evaluate TOPDOWN2, we consider three datasets presenting a range of different size / feature types, Bank and German Credit (preprocessed by AIF360 [4]) and the American Community Survey (ACS) dataset preprocessed by Folktables3 [9]. ... We use public datasets. |
| Dataset Splits | Yes | Data is split into 3 subsets for black-box training, post-processing training, and testing; consisting of 40:40:20 splits in 5 fold cross validation. |
| Hardware Specification | No | The paper mentions that hardware details are in Appendix XI in the ethics checklist, but these details are not provided in the main text of the paper. No specific GPU, CPU models or detailed cloud resources are mentioned. |
| Software Dependencies | No | For the black-box, we consider a clipped (Assumption 1 with B = 1) random forest (RF) from scikit-learn calibrated using Platt s method [23]. No specific version numbers for scikit-learn or other software dependencies are provided. |
| Experiment Setup | Yes | The RF consists of an ensemble of 50 decision trees with a maximum depth of 4 and a random selection of 10% of the training samples per decision tree. For these experiments, we consider age as a binary sensitive attribute with a bin split at 25... For each of these TOPDOWN configurations, we boost for 32 iterations. |