reproducibilityindex.ai

Certifying Robustness to Programmable Data Bias in Decision Trees

Authors: Anna Meyer, Aws Albarghouthi, Loris D'Antoni

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on datasets that are commonly used in the fairness literature, and demonstrate our approach s viability on a range of bias models. We evaluate our approach on a number of bias models and datasets from the fairness literature. Our tool can certify pointwise robustness for a variety of bias models; we also show that some datasets have unequal robustness-certiﬁcation rates across demographics groups.
Researcher Affiliation	Academia	Anna P. Meyer, Aws Albarghouthi , and Loris D Antoni Department of Computer Sciences University of Wisconsin Madison Madison, WI 53706 {annameyer, aws, loris}@cs.wisc.edu
Pseudocode	No	The paper describes the learning algorithm and abstract transformers mathematically and textually but does not provide a formal pseudocode block or algorithm listing.
Open Source Code	Yes	For all datasets, we use the standard train/test split if one is provided; otherwise, we create our own train/test splits, which are available in our code repository at https://github.com/annapmeyer/antidote-P.
Open Datasets	Yes	We evaluate on Adult Income [17] (training n=32,561), COMPAS [29] (n=4629), and Drug Consumption [20] (n=1262). A fourth dataset, MNIST 1/7 (n=13,007), is in the Appendix.
Dataset Splits	No	The paper mentions using "standard train/test split" or creating "our own train/test splits" and points to the code repository for details. It does not explicitly state validation split percentages or methodology within the main text.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper states that the technique is implemented in C++ but does not provide specific version numbers for any software dependencies, libraries, or compilers used.
Experiment Setup	Yes	For each dataset, we choose the smallest tree depth where accuracy improves no more than 1% at the next-highest depth. For Adult Income and MNIST 1/7, this threshold is depth 2 (accuracy 83% and 97%, respectively); for COMPAS and Drug Consumption it is depth 1 (accuracy 64% and 76%, respectively).