Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Individually Fair Gradient Boosting
Authors: Alexander Vargo, Fan Zhang, Mikhail Yurochkin, Yuekai Sun
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias. |
| Researcher Affiliation | Collaboration | Alexander Vargo Department of Mathematics University of Michigan EMAIL Fan Zhang School of Information Science and Technology Shanghai Tech University EMAIL Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab EMAIL Yuekai Sun Department of Statistics University of Michigan EMAIL |
| Pseudocode | Yes | Algorithm 1 Fair gradient boosting |
| Open Source Code | No | No explicit statement of releasing code or a link to a code repository for the described methodology was found. |
| Open Datasets | Yes | The German credit data set (Dua & Graff, 2017) contains information from 1000 individuals; the ML task is to label the individuals as good or bad credit risks. The Adult data set (Dua & Graff, 2017) is another common benchmark in the fairness literature. We study the COMPAS recidivism prediction data set (Larson et al., 2016). |
| Dataset Splits | Yes | For the baseline GBDT and projecting, we select hyperparameters by splitting 20% of the training data into a validation set and evaluating the performance on the validation set. |
| Hardware Specification | No | Section C.6 mentions the number of CPUs and GPUs (e.g., "4 CPUs", "1 GPU") used for experiments but does not provide specific models, types, or memory details of the hardware. |
| Software Dependencies | Yes | We use the default parameters in the Ridge CV class from the scikit-learn package, version 0.21.3 (Pedregosa et al., 2011). We implement the adversarial debiasing methods (in the adversarial_debiasing class from the IBM’s AIF360 package, version 0.2.2 (Bellamy et al., 2018)). |
| Experiment Setup | Yes | Table 4: Optimal XGBoost parameters for German credit data set. For Bu DRO, we also used a pertubation budget of ϵ = 1.0. Table 6: Optimal XGBoost parameters for the Adult data set. Table 8: Optimal XGBoost parameters for COMPAS data set. |