Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Achievable Fairness on Your Data With Utility Guarantees
Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments spanning tabular (e.g., Adult), image (Celeb A), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods. |
| Researcher Affiliation | Collaboration | Muhammad Faaiz Taufiq Byte Dance Research EMAIL Jean-François Ton Byte Dance Research EMAIL Yang Liu University of California Santa Cruz EMAIL |
| Pseudocode | Yes | Algorithm 1 Bootstrapping for estimating ϵ(h) := Φfair(h) g Φfair(h) |
| Open Source Code | Yes | The code to reproduce our experiments is provided at github.com/faaiz T/Dataset Fairness. |
| Open Datasets | Yes | These datasets range from tabular (Adult and COMPAS ), to image-based (Celeb A), and natural language processing datasets (Jigsaw). [7], [5], [28], [20] |
| Dataset Splits | Yes | Specifically, we assume access to a held-out calibration dataset Dcal := {(Xi, Ai, Yi)}i which is disjoint from the training data. ... obtained using a 10% data split as calibration dataset Dcal. ... with early stopping based on validation losses. |
| Hardware Specification | Yes | Training these simple models takes roughly 5 minutes on a Tesla-V100-SXM2-32GB GPU. ... Training this model takes roughly 1.5 hours on a Tesla-V100-SXM2-32GB GPU. ... Training this model takes roughly 6 hours on a Tesla-V100-SXM2-32GB GPU. |
| Software Dependencies | No | The paper mentions software components like 'BERT architecture [13]' and 'Feature-wise Linear Modulation (Fi LM) mechanism', but it does not specify version numbers for these or other programming languages or libraries (e.g., Python version, PyTorch version, etc.). |
| Experiment Setup | Yes | We train the model for a maximum of 1000 epochs, with early stopping based on validation losses. ... we sample the parameter λ from a distribution Pλ. ... we use the log-uniform distribution as per [15] as the sampling distribution Pλ, where the uniform distribution is U[10 6, 10]. ... we follow in the footsteps of [15] to use Feature-wise Linear Modulation (Fi LM) [34] layers. |