reproducibilityindex.ai

Fairness Reprogramming

Authors: Guanhua Zhang, Yihua Zhang, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, Shiyu Chang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less data dependency under two widely-used fairness criteria.
Researcher Affiliation	Collaboration	Guanhua Zhang UC Santa Barbara guanhua@ucsb.edu Yihua Zhang Michigan State University zhan1908@msu.edu Yang Zhang MIT-IBM Watson AI Lab yang.zhang2@ibm.com Wenqi Fan The Hong Kong Polytechnic University wenqifan@polyu.edu.hk Qing Li The Hong Kong Polytechnic University csqli@comp.polyu.edu.hk Sijia Liu Michigan State University & MIT-IBM Watson AI Lab liusiji5@msu.edu Shiyu Chang UC Santa Barbara chang87@ucsb.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are available at https://github.com/UCSB-NLP-Chang/Fairness-Reprogramming.git.
Open Datasets	Yes	Civil Comments [57, 58]: The dataset contains 448k texts with labels that depict the toxicity of each input. The demographic information of each text is provided. Celeb A [56]: The dataset contains over 200k human face images and each contains 39 binary attribute annotations.
Dataset Splits	Yes	For both datasets, we split the entire data into a training set, a tuning set, a validation set, and a testing set. The training set is used for the base model training, i.e., to obtain a biased model for reprogramming. The tunning set and validation set are used for trigger training and hyper-parameter selection. We report our results on the testing set. It is worth mentioning that there is no overlapping data between different sets and the size of the tuning set is much smaller than the training one. Speciﬁcally, we set the size ratio between the tunning set and the training as 1 5 and 1 100 for Civil Comments and Celeb A, respectively.
Hardware Specification	No	The paper states 'The computing resources used in this work were partially supported by the MIT-IBM Watson AI Lab.' but does not specify any particular hardware components like CPU/GPU models or memory.
Software Dependencies	No	The paper mentions software like BERT, RESNET-18, ADAMW, and ADAM but does not provide specific version numbers for these or any underlying frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	For NLP experiments, we use a pre-trained BERT [62] to obtain the BASE and ADVIN models. We use ADAMW [63] as the optimizer, and set the learning rate to 10 5 for all baselines and 0.1 for FAIRREPROGRAM. For CV experiments, we consider a RESNET-18 [64] that pre-trained on Image Net. The discriminator used in ADVIN, ADVPOST and FAIRREPROGRAM is a three-layer MLP, and the parameters are optimized using ADAM with a learning rate of 0.01. We pick the best model based on the accuracy (for the BASE) or the bias scores (for all other debiasing methods) of the validation set.