reproducibilityindex.ai

FARE: Provably Fair Representation Learning with Practical Certificates

Authors: Nikola Jovanović, Mislav Balunovic, Dimitar Iliev Dimitrov, Martin Vechev

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our comprehensive experimental evaluation, we demonstrate that FARE produces practical certiﬁcates that are tight and often even comparable with purely empirical results obtained by prior methods, which establishes the practical value of our approach.
Researcher Affiliation	Academia	Nikola Jovanovi c 1 Mislav Balunovi c 1 Dimitar I. Dimitrov 1 Martin Vechev 1 ... 1Department of Computer Science, ETH Zurich.
Pseudocode	No	The paper describes procedures and derivations in prose and mathematical notation but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation of FARE is publicly available at https://github.com/eth-sri/fare.
Open Datasets	Yes	We consider common fairness datasets: Health (Kaggle, 2012), ACSIncome-CA (only California), and ACSIncome-US (US-wide) (Ding et al., 2021).
Dataset Splits	Yes	a set D of datapoints {(x(j), s(j))} from X is split into a training set Dtrain, used to train f, validation set Dval, held-out for the upperbounding procedure (and not used in training of f in any capacity), and a test set Dtest, used to evaluate the empirical accuracy and fairness of downstream classiﬁers.
Hardware Specification	Yes	We use a single core of the i9-7900X CPU Intel CPU that has clock speed of 3.30GHz. All methods were given a single NVIDIA 1080 Ti GPU with 12 GB of VRAM, except FARE which does not require a GPU.
Software Dependencies	No	The paper mentions hardware and operating systems but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For FARE, there are four hyperparameters: γ (used for the criterion, where larger γ puts more focus on fairness), k (upper bound for the number of leaves), ni (lower bound for the number of examples in a leaf), and v (the ratio of the training set to be used as a validation set). ... In our experiments we investigate γ [0, 1], k [2, 200], ni [50, 1000], v {0.1, 0.2, 0.3, 0.5}. For the upper-bounding procedure, we always set ϵ = 0.05, ϵb = ϵs = 0.005, and thus ϵc = 0.04. Finally, when sorting categorical features as described in Section 6, we use q {1, 2, 4} in all cases.