Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Training individually fair ML models with sensitive subspace robustness
Authors: Mikhail Yurochkin, Amanda Bower, Yuekai Sun
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present results from using Sen SR to train individually fair ML models for two tasks: sentiment analysis and income prediction. We pick these two tasks to demonstrate the efficacy of Sen SR on problems with structured (income prediction) and unstructured (sentiment analysis) inputs and in which the sensitive attribute (income prediction) is observed and unobserved (sentiment analysis). We refer to Appendix C and D for the implementation details. |
| Researcher Affiliation | Collaboration | Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab EMAIL Amanda Bower , Yuekai Sun Department of Mathematics Department of Statistics University of Michigan EMAIL |
| Pseudocode | Yes | Algorithm 1 stochastic gradient method for equation 2.3; Algorithm 2 Sensitive Subspace Robustness (Sen SR); Algorithm 3 estimating bΣ for the fair metric |
| Open Source Code | Yes | This section is to accompany the implementation of the Sen SR algorithm and is best understood by reading it along with the code implemented using Tensor Flow.4 https://github.com/IBM/sensitive-subspace-robustness |
| Open Datasets | Yes | we apply Sen SR to a classification task on the Adult (Dua & Graff, 2017) data set; We embed words using 300-dimensional Glo Ve (Pennington et al., 2014); We use list of names provided in Caliskan et al. (2017) |
| Dataset Splits | Yes | See Table 2 for the average of each metric on the test sets over ten 80%/20% train/test splits; In Table 1 we report results averaged across 10 repetitions with 90%/10% train/test splits |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory, or cluster configurations) used for its experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 3: Sen SR hyperparameter choices in the experiments E B s se ϵ f fe Sentiment 4K 1K 0.1 10 0.1 0.01 10 Adult 12K 1K 10 50 10 3 10 4 40. We use same learning rate of 0.001 for the parameters optimizer, however different learning rates across datasets for subspace step(s) and full step(f). |