Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fairness Under Demographic Scarce Regime
Authors: Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on five datasets showed that the proposed framework yields models with significantly better fairness-accuracy tradeoffs than classic attribute classifiers. We perform extensive experiments on a wide range of real-world datasets to demonstrate the effectiveness of the proposed framework compared to existing methods. |
| Researcher Affiliation | Academia | Patrik Joslin Kenfack EMAIL ÉTS Montréal, Mila Samira Ebrahimi Kahou EMAIL University of Calgary, Mila Ulrich Aïvodji EMAIL ÉTS Montréal, Mila |
| Pseudocode | No | The paper describes the methodology and its stages in text, and Figure 1 provides a high-level overview diagram. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | Yes | The source code is available at https://github.com/patrikken/fair-dsr. |
| Open Datasets | Yes | We validate our method on five real-world benchmarks widely used for bias assessment: Adult Income (Asuncion & Newman, 2007)1, Compas (Jeff et al., 2016), Law school (LSAC) (Wightman, 1998), Celeb A (Liu et al., 2018) (Wightman, 1998), and the New Adult (Ding et al., 2021) dataset. |
| Dataset Splits | Yes | We use 20% of each dataset as the group-labeled dataset (D2) and 80% as the dataset without sensitive attributes (D1). All the baselines are trained on 70% of D1, and fairness and accuracy are evaluated on the 30% as the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch (Paszke et al., 2019)', 'Adam optimizer (Kingma & Ba, 2014)', 'Fairlean (Bird et al., 2020)', and 'MAPIE library (Cordier et al., 2023)'. While these citations refer to specific publications, they do not provide explicit software version numbers (e.g., PyTorch 1.9) as requested for reproducibility. |
| Experiment Setup | Yes | The student and teacher models were implemented as feed-forward Multi-layer Perceptrons (MLPs) with Pytorch (Paszke et al., 2019), and the loss function 1 is minimized using the Adam optimizer (Kingma & Ba, 2014) with learning rate 0.001 and batch size 256. Following Yu et al. (2019); Laine & Aila (2017), we used α = 0.99 for the EMA parameter for updating the teacher weights using the student s weights across epochs. The uncertainty threshold is finetuned over the interval [0.1, 0.7] using 10% of the training data. The best-performing threshold is used for the second step s threshold to obtain D 1. The uncertainty threshold that achieved the best results are 0.30, 0.60, 0.66, and 0.45 for the Adult, Compas, LSAC, and Celeb A datasets, respectively. Random forest was initialized with a maximum depth of 5 and minimum sample leaf of 10. |