reproducibilityindex.ai

Achievable Fairness on Your Data With Utility Guarantees

Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments spanning tabular (e.g., Adult), image (Celeb A), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.
Researcher Affiliation	Collaboration	Muhammad Faaiz Taufiq Byte Dance Research faaiz.taufiq@bytedance.com Jean-François Ton Byte Dance Research jeanfrancois@bytedance.com Yang Liu University of California Santa Cruz yangliu@ucsc.edu
Pseudocode	Yes	Algorithm 1 Bootstrapping for estimating ϵ(h) := Φfair(h) g Φfair(h)
Open Source Code	Yes	The code to reproduce our experiments is provided at github.com/faaiz T/Dataset Fairness.
Open Datasets	Yes	These datasets range from tabular (Adult and COMPAS ), to image-based (Celeb A), and natural language processing datasets (Jigsaw). [7], [5], [28], [20]
Dataset Splits	Yes	Specifically, we assume access to a held-out calibration dataset Dcal := {(Xi, Ai, Yi)}i which is disjoint from the training data. ... obtained using a 10% data split as calibration dataset Dcal. ... with early stopping based on validation losses.
Hardware Specification	Yes	Training these simple models takes roughly 5 minutes on a Tesla-V100-SXM2-32GB GPU. ... Training this model takes roughly 1.5 hours on a Tesla-V100-SXM2-32GB GPU. ... Training this model takes roughly 6 hours on a Tesla-V100-SXM2-32GB GPU.
Software Dependencies	No	The paper mentions software components like 'BERT architecture [13]' and 'Feature-wise Linear Modulation (Fi LM) mechanism', but it does not specify version numbers for these or other programming languages or libraries (e.g., Python version, PyTorch version, etc.).
Experiment Setup	Yes	We train the model for a maximum of 1000 epochs, with early stopping based on validation losses. ... we sample the parameter λ from a distribution Pλ. ... we use the log-uniform distribution as per [15] as the sampling distribution Pλ, where the uniform distribution is U[10 6, 10]. ... we follow in the footsteps of [15] to use Feature-wise Linear Modulation (Fi LM) [34] layers.