Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

Authors: Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we perform experiments demonstrating that our two-dataset approach successfully improves constraint generalization even when our theorems do not hold. In other words, providing independent datasets to each player works well as a heuristic for improving constraint generalization.
Researcher Affiliation Collaboration 1Google AI, Mountain View, CA, USA 2Toyota Technological Institute at Chicago, Chicago, IL, USA 3Cornell University, Computer Science Department, Ithaca, NY, USA 4Kakao Mobility, Seongnam-si, Geyonggi-do, South Korea.
Pseudocode Yes Algorithm 1 Finds an approximate equilibrium of the empirical proxy-Lagrangian game (Definition 1)....
Open Source Code No Our implementation uses TensorFlow, and is based on Cotter et al. (2019)’s open-source constrained optimization library.
Open Datasets Yes Communities and Crime: This UCI dataset (Dheeru & Karra Taniskidou, 2017) includes features aggregated from census and law enforcement data...
Dataset Splits Yes For the two-dataset experiments, the training set was randomly permuted and split in half, into S(trn) and S(val).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments.
Software Dependencies No The paper mentions 'TensorFlow' and 'ADAM' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes To avoid a hyperparameter search, we replace the stochastic gradient updates of Algorithms 3 and 4 with ADAM (Kingma & Ba, 2014), using the default parameters. For both our two-dataset algorithm and the one-dataset baseline, the result of training is a sequence of iterates θ(1), . . . , θ(T ), but instead of keeping track of the full sequence, we only store a total of 100 evenly-spaced iterates for each run. Rather than using the weighted predictor of Theorems 1 and 2, we use the shrinking procedure of Cotter et al. (2019) (see Appendix C) to find the best stochastic classifier supported on the sequence of 100 iterates.