Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fairness without Demographics through Knowledge Distillation
Authors: Junyi Chai, Taeuk Jang, Xiaoqian Wang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three datasets show that our method outperforms state-of-the-art alternatives, with notable improvements in group fairness and with relatively small decrease in accuracy. |
| Researcher Affiliation | Academia | Junyi Chai, Taeuk Jang, Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47906 EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | New Adult: The Adult reconstruction dataset (Ding et al., 2021) contains 49,531 samples with 14 attributes. COMPAS: The COMPAS dataset (Larson et al., 2016) contains 7,215 samples with 11 attributes. Following previous works on fairness (Zafar et al., 2017), we only select black and white defendants in COMPAS dataset, and the modified dataset contains 6,150 samples. The goal is to predict whether a defendant reoffends within two years, and we choose sex and race as sensitive attributes. Celeb A: The Celeb A dataset (Liu et al., 2015) contains 202,599 face images, each of resolution 178 x 218, with 40 binary attributes. |
| Dataset Splits | Yes | To avoid large discrepancies in testing data, before each repetition, we randomly spilt data into 50% training data, 10% validation data and 40% test data. |
| Hardware Specification | Yes | We implement our method in Py Torch 1.10.1 with one NVIDIA RTX-3090 GPU. |
| Software Dependencies | Yes | We implement our method in Py Torch 1.10.1 with one NVIDIA RTX-3090 GPU. |
| Experiment Setup | Yes | We build the teacher model using Res Net-152 (He et al., 2016) and student model using Res Net-18 (He et al., 2016). For student model trained on softmax label, the temperature is tuned to find the best validation accuracy. The hyperparameters of comparing methods are tuned with binary search to find global minimum, as suggested in the original paper (Hashimoto et al., 2018). |