Certifying Robustness to Programmable Data Bias in Decision Trees

Authors: Anna Meyer, Aws Albarghouthi, Loris D'Antoni

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on datasets that are commonly used in the fairness literature, and demonstrate our approach s viability on a range of bias models. We evaluate our approach on a number of bias models and datasets from the fairness literature. Our tool can certify pointwise robustness for a variety of bias models; we also show that some datasets have unequal robustness-certification rates across demographics groups.
Researcher Affiliation Academia Anna P. Meyer, Aws Albarghouthi , and Loris D Antoni Department of Computer Sciences University of Wisconsin Madison Madison, WI 53706 {annameyer, aws, loris}@cs.wisc.edu
Pseudocode No The paper describes the learning algorithm and abstract transformers mathematically and textually but does not provide a formal pseudocode block or algorithm listing.
Open Source Code Yes For all datasets, we use the standard train/test split if one is provided; otherwise, we create our own train/test splits, which are available in our code repository at https://github.com/annapmeyer/antidote-P.
Open Datasets Yes We evaluate on Adult Income [17] (training n=32,561), COMPAS [29] (n=4629), and Drug Consumption [20] (n=1262). A fourth dataset, MNIST 1/7 (n=13,007), is in the Appendix.
Dataset Splits No The paper mentions using "standard train/test split" or creating "our own train/test splits" and points to the code repository for details. It does not explicitly state validation split percentages or methodology within the main text.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper states that the technique is implemented in C++ but does not provide specific version numbers for any software dependencies, libraries, or compilers used.
Experiment Setup Yes For each dataset, we choose the smallest tree depth where accuracy improves no more than 1% at the next-highest depth. For Adult Income and MNIST 1/7, this threshold is depth 2 (accuracy 83% and 97%, respectively); for COMPAS and Drug Consumption it is depth 1 (accuracy 64% and 76%, respectively).