Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training individually fair ML models with sensitive subspace robustness

Authors: Mikhail Yurochkin, Amanda Bower, Yuekai Sun

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present results from using Sen SR to train individually fair ML models for two tasks: sentiment analysis and income prediction. We pick these two tasks to demonstrate the efficacy of Sen SR on problems with structured (income prediction) and unstructured (sentiment analysis) inputs and in which the sensitive attribute (income prediction) is observed and unobserved (sentiment analysis). We refer to Appendix C and D for the implementation details.
Researcher Affiliation Collaboration Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab EMAIL Amanda Bower , Yuekai Sun Department of Mathematics Department of Statistics University of Michigan EMAIL
Pseudocode Yes Algorithm 1 stochastic gradient method for equation 2.3; Algorithm 2 Sensitive Subspace Robustness (Sen SR); Algorithm 3 estimating bΣ for the fair metric
Open Source Code Yes This section is to accompany the implementation of the Sen SR algorithm and is best understood by reading it along with the code implemented using Tensor Flow.4 https://github.com/IBM/sensitive-subspace-robustness
Open Datasets Yes we apply Sen SR to a classification task on the Adult (Dua & Graff, 2017) data set; We embed words using 300-dimensional Glo Ve (Pennington et al., 2014); We use list of names provided in Caliskan et al. (2017)
Dataset Splits Yes See Table 2 for the average of each metric on the test sets over ten 80%/20% train/test splits; In Table 1 we report results averaged across 10 repetitions with 90%/10% train/test splits
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory, or cluster configurations) used for its experiments.
Software Dependencies No The paper mentions 'TensorFlow' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Table 3: Sen SR hyperparameter choices in the experiments E B s se ϵ f fe Sentiment 4K 1K 0.1 10 0.1 0.01 10 Adult 12K 1K 10 50 10 3 10 4 40. We use same learning rate of 0.001 for the parameters optimizer, however different learning rates across datasets for subspace step(s) and full step(f).