Fairness Reprogramming
Authors: Guanhua Zhang, Yihua Zhang, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, Shiyu Chang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less data dependency under two widely-used fairness criteria. |
| Researcher Affiliation | Collaboration | Guanhua Zhang UC Santa Barbara guanhua@ucsb.edu Yihua Zhang Michigan State University zhan1908@msu.edu Yang Zhang MIT-IBM Watson AI Lab yang.zhang2@ibm.com Wenqi Fan The Hong Kong Polytechnic University wenqifan@polyu.edu.hk Qing Li The Hong Kong Polytechnic University csqli@comp.polyu.edu.hk Sijia Liu Michigan State University & MIT-IBM Watson AI Lab liusiji5@msu.edu Shiyu Chang UC Santa Barbara chang87@ucsb.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/UCSB-NLP-Chang/Fairness-Reprogramming.git. |
| Open Datasets | Yes | Civil Comments [57, 58]: The dataset contains 448k texts with labels that depict the toxicity of each input. The demographic information of each text is provided. Celeb A [56]: The dataset contains over 200k human face images and each contains 39 binary attribute annotations. |
| Dataset Splits | Yes | For both datasets, we split the entire data into a training set, a tuning set, a validation set, and a testing set. The training set is used for the base model training, i.e., to obtain a biased model for reprogramming. The tunning set and validation set are used for trigger training and hyper-parameter selection. We report our results on the testing set. It is worth mentioning that there is no overlapping data between different sets and the size of the tuning set is much smaller than the training one. Specifically, we set the size ratio between the tunning set and the training as 1 5 and 1 100 for Civil Comments and Celeb A, respectively. |
| Hardware Specification | No | The paper states 'The computing resources used in this work were partially supported by the MIT-IBM Watson AI Lab.' but does not specify any particular hardware components like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions software like BERT, RESNET-18, ADAMW, and ADAM but does not provide specific version numbers for these or any underlying frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | For NLP experiments, we use a pre-trained BERT [62] to obtain the BASE and ADVIN models. We use ADAMW [63] as the optimizer, and set the learning rate to 10 5 for all baselines and 0.1 for FAIRREPROGRAM. For CV experiments, we consider a RESNET-18 [64] that pre-trained on Image Net. The discriminator used in ADVIN, ADVPOST and FAIRREPROGRAM is a three-layer MLP, and the parameters are optimized using ADAM with a learning rate of 0.01. We pick the best model based on the accuracy (for the BASE) or the bias scores (for all other debiasing methods) of the validation set. |