Metric-Fair Classifier Derandomization

Authors: Jimmy Wu, Yatong Chen, Yang Liu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of classifier derandomization in machine learning: given a stochastic binary classifier f : X [0, 1], sample a deterministic classifier ˆf : X {0, 1} that approximates the output of f in aggregate over any data distribution. Recent work revealed how to efficiently derandomize a stochastic classifier with strong output approximation guarantees, but at the cost of individual fairness that is, if f treated similar inputs similarly, ˆf did not. In this paper, we initiate a systematic study of classifier derandomization with metric fairness guarantees. We show that the prior derandomization approach is almost maximally metric-unfair, and that a simple random threshold derandomization achieves optimal fairness preservation but with weaker output approximation. We then devise a derandomization procedure that provides an appealing tradeoff between these two: if f is α-metric fair according to a metric d with a locality-sensitive hash (LSH) family, then our derandomized ˆf is, with high probability, O(α)-metric fair and a close approximation of f. We also prove generic results applicable to all (fair and unfair) classifier derandomization procedures, including a bias-variance decomposition and reductions between various notions of metric fairness.
Researcher Affiliation Academia Department of Computer Science and Engineering, University of California, Santa Cruz, USA. Correspondence to: Jimmy Wu <jimmywu126@gmail.com>, Yatong Chen <ychen592@ucsc.edu>, Yang Liu (corresponding author) <yangliu@ucsc.edu>.
Pseudocode No The paper describes procedures and definitions mathematically (e.g., equations for FRT and FLS), but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not contain any statement about making its source code available, nor does it provide any links to a code repository.
Open Datasets No The paper is theoretical and does not perform experiments with datasets, thus it does not mention specific datasets or their public availability for training.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore it does not provide details on training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe experiments, therefore it does not mention specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and focuses on mathematical derivations and algorithm design, not specific software implementations, thus it does not list software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe experiments, therefore it does not provide details about experimental setup, hyperparameters, or training configurations.