FARE: Provably Fair Representation Learning with Practical Certificates

Authors: Nikola Jovanović, Mislav Balunovic, Dimitar Iliev Dimitrov, Martin Vechev

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our comprehensive experimental evaluation, we demonstrate that FARE produces practical certificates that are tight and often even comparable with purely empirical results obtained by prior methods, which establishes the practical value of our approach.
Researcher Affiliation Academia Nikola Jovanovi c 1 Mislav Balunovi c 1 Dimitar I. Dimitrov 1 Martin Vechev 1 ... 1Department of Computer Science, ETH Zurich.
Pseudocode No The paper describes procedures and derivations in prose and mathematical notation but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The implementation of FARE is publicly available at https://github.com/eth-sri/fare.
Open Datasets Yes We consider common fairness datasets: Health (Kaggle, 2012), ACSIncome-CA (only California), and ACSIncome-US (US-wide) (Ding et al., 2021).
Dataset Splits Yes a set D of datapoints {(x(j), s(j))} from X is split into a training set Dtrain, used to train f, validation set Dval, held-out for the upperbounding procedure (and not used in training of f in any capacity), and a test set Dtest, used to evaluate the empirical accuracy and fairness of downstream classifiers.
Hardware Specification Yes We use a single core of the i9-7900X CPU Intel CPU that has clock speed of 3.30GHz. All methods were given a single NVIDIA 1080 Ti GPU with 12 GB of VRAM, except FARE which does not require a GPU.
Software Dependencies No The paper mentions hardware and operating systems but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For FARE, there are four hyperparameters: γ (used for the criterion, where larger γ puts more focus on fairness), k (upper bound for the number of leaves), ni (lower bound for the number of examples in a leaf), and v (the ratio of the training set to be used as a validation set). ... In our experiments we investigate γ [0, 1], k [2, 200], ni [50, 1000], v {0.1, 0.2, 0.3, 0.5}. For the upper-bounding procedure, we always set ϵ = 0.05, ϵb = ϵs = 0.005, and thus ϵc = 0.04. Finally, when sorting categorical features as described in Section 6, we use q {1, 2, 4} in all cases.