Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

Authors: Hilde Weerts, Florian Pfisterer, Matthias Feurer, Katharina Eggensperger, Edward Bergman, Noor Awad, Joaquin Vanschoren, Mykola Pechenizkiy, Bernd Bischl, Frank Hutter

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We collect datasets used for empirical comparisons in the fairness-aware Auto ML works we cite in order to highlight the limited scope and quantity of datasets used during benchmarks. The design of Auto ML systems is often largely benchmark driven systems are developed to compete on standardized test suites (Gijsbers et al., 2022) or in Auto ML competitions (Guyon et al., 2019). This has the benefit that new system components are immediately tested for their empirical performance and only included if they provide substantial benefits.
Researcher Affiliation	Academia	Hilde Weerts EMAIL Eindhoven University of Technology Florian Pfisterer EMAIL Ludwig-Maximilians-Universität München Munich Center for Machine Learning Matthias Feurer EMAIL Albert-Ludwigs-Universität Freiburg Katharina Eggensperger EMAIL Albert-Ludwigs-Universität Freiburg University of Tübingen Edward Bergman EMAIL Noor Awad EMAIL Albert-Ludwigs-Universität Freiburg Joaquin Vanschoren EMAIL Mykola Pechenizkiy EMAIL Eindhoven University of Technology Bernd Bischl EMAIL Ludwig-Maximilians-Universität München Munich Center for Machine Learning Frank Hutter EMAIL Albert-Ludwigs-Universität Freiburg
Pseudocode	No	The paper describes concepts and existing approaches in fairness-aware Auto ML and provides guidelines, but it does not include any structured pseudocode or algorithm blocks for a new method.
Open Source Code	No	The paper is a survey and does not present new methods requiring code release. It references existing open-source libraries (Bellamy et al., 2019; Weerts et al., 2023) and other Auto ML systems, but does not provide source code for its own methodology or analysis.
Open Datasets	Yes	Appendix A. Datasets Used in Fairness-aware Auto ML Research. We collect datasets used for empirical comparisons in the fairness-aware Auto ML works we cite in order to highlight the limited scope and quantity of datasets used during benchmarks. Table 1: Fairness-related datasets used in prior research on fairness-aware Auto ML. Paper Adult German Credit Compas Donors Choice AOF (private) Default Risk (Kaggle) Bank MEPS. Additionally, the paper mentions "Pro Publica’s COMPAS dataset (Angwin et al., 2016)".
Dataset Splits	No	The paper discusses general evaluation protocols like train-valid-test splits and cross-validation, and emphasizes the importance of robust estimation for fairness metrics, but it does not specify concrete dataset splits for any particular experiment within the paper itself.
Hardware Specification	No	The paper is a survey and review of fairness-aware Auto ML and does not report on original experimental results that would require specific hardware specifications. Therefore, no hardware details are provided.
Software Dependencies	No	The paper is a survey and review of fairness-aware Auto ML and does not describe a specific software implementation or experimental setup that would require listing software dependencies with version numbers.
Experiment Setup	No	The paper is a survey and review, offering guidelines and opportunities for fairness-aware Auto ML. It does not present original experimental work, and therefore, does not include specific details about an experimental setup, such as hyperparameter values or training configurations.