Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Certifying Robustness to Programmable Data Bias in Decision Trees
Authors: Anna Meyer, Aws Albarghouthi, Loris D'Antoni
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on datasets that are commonly used in the fairness literature, and demonstrate our approach s viability on a range of bias models. We evaluate our approach on a number of bias models and datasets from the fairness literature. Our tool can certify pointwise robustness for a variety of bias models; we also show that some datasets have unequal robustness-certification rates across demographics groups. |
| Researcher Affiliation | Academia | Anna P. Meyer, Aws Albarghouthi , and Loris D Antoni Department of Computer Sciences University of Wisconsin Madison Madison, WI 53706 EMAIL |
| Pseudocode | No | The paper describes the learning algorithm and abstract transformers mathematically and textually but does not provide a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | For all datasets, we use the standard train/test split if one is provided; otherwise, we create our own train/test splits, which are available in our code repository at https://github.com/annapmeyer/antidote-P. |
| Open Datasets | Yes | We evaluate on Adult Income [17] (training n=32,561), COMPAS [29] (n=4629), and Drug Consumption [20] (n=1262). A fourth dataset, MNIST 1/7 (n=13,007), is in the Appendix. |
| Dataset Splits | No | The paper mentions using "standard train/test split" or creating "our own train/test splits" and points to the code repository for details. It does not explicitly state validation split percentages or methodology within the main text. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states that the technique is implemented in C++ but does not provide specific version numbers for any software dependencies, libraries, or compilers used. |
| Experiment Setup | Yes | For each dataset, we choose the smallest tree depth where accuracy improves no more than 1% at the next-highest depth. For Adult Income and MNIST 1/7, this threshold is depth 2 (accuracy 83% and 97%, respectively); for COMPAS and Drug Consumption it is depth 1 (accuracy 64% and 76%, respectively). |