Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FairGBM: Gradient Boosting with Fairness Constraints
Authors: André Cruz, Catarina G Belém, João Bravo, Pedro Saleiro, Pedro Bizarro
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on five large-scale public benchmark datasets, popularly known as folktables datasets, as well as on a real-world financial services case-study. We compare Fair GBM with a set of constrained optimization baselines from the Fair ML literature. |
| Researcher Affiliation | Collaboration | 1Feedzai 2MPI for Intelligent Systems, Tübingen 3UC Irvine |
| Pseudocode | Yes | Algorithm 1 Fair GBM training pseudocode |
| Open Source Code | Yes | Our implementation1 shows an order of magnitude speedup in training time relative to related work, a pivotal aspect to foster the widespread adoption of Fair GBM by real-world practitioners. (footnote 1: https://github.com/feedzai/fairgbm) |
| Open Datasets | Yes | We validate our method on five large-scale public benchmark datasets, popularly known as folktables datasets, as well as on a real-world financial services case-study. The folktables datasets were put forth by Ding et al. (2021) and are derived from the American Community Survey (ACS) public use microdata sample from 2018. |
| Dataset Splits | Yes | Each task is randomly split in training (60%), validation (20%), and test (20%) data. |
| Hardware Specification | Yes | ACSIncome and AOF experiments: Intel i7-8650U CPU, 32GB RAM. ACSEmployment, ACSMobility, ACSTravel Time, ACSPublic Coverage experiments: each model trained in parallel on a cluster. Resources per training job: 1 v CPU core (Intel Xeon E5-2695), 8GB RAM3. |
| Software Dependencies | No | The paper mentions "Light GBM implementation" and its language "C++" and "Python interface" but does not provide specific version numbers for Light GBM, Python, or any other critical libraries/dependencies. The reproducibility checklist points to supplementary materials but does not explicitly list software versions in the text. |
| Experiment Setup | Yes | To control for the variability of results when selecting different hyperparameters, we randomly sample 100 hyperparameter configurations of each algorithm. In the case of EG and GS, both algorithms already fit n base estimators as part of a single training procedure. Hence, we run 10 trials of EG and GS, each with a budget of n = 10 iterations, for a total budget of 100 models trained (leading to an equal budget for all algorithms). |