Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
Authors: Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we perform experiments demonstrating that our two-dataset approach successfully improves constraint generalization even when our theorems do not hold. In other words, providing independent datasets to each player works well as a heuristic for improving constraint generalization. |
| Researcher Affiliation | Collaboration | 1Google AI, Mountain View, CA, USA 2Toyota Technological Institute at Chicago, Chicago, IL, USA 3Cornell University, Computer Science Department, Ithaca, NY, USA 4Kakao Mobility, Seongnam-si, Geyonggi-do, South Korea. |
| Pseudocode | Yes | Algorithm 1 Finds an approximate equilibrium of the empirical proxy-Lagrangian game (Definition 1).... |
| Open Source Code | No | Our implementation uses TensorFlow, and is based on Cotter et al. (2019)’s open-source constrained optimization library. |
| Open Datasets | Yes | Communities and Crime: This UCI dataset (Dheeru & Karra Taniskidou, 2017) includes features aggregated from census and law enforcement data... |
| Dataset Splits | Yes | For the two-dataset experiments, the training set was randomly permuted and split in half, into S(trn) and S(val). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'ADAM' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To avoid a hyperparameter search, we replace the stochastic gradient updates of Algorithms 3 and 4 with ADAM (Kingma & Ba, 2014), using the default parameters. For both our two-dataset algorithm and the one-dataset baseline, the result of training is a sequence of iterates θ(1), . . . , θ(T ), but instead of keeping track of the full sequence, we only store a total of 100 evenly-spaced iterates for each run. Rather than using the weighted predictor of Theorems 1 and 2, we use the shrinking procedure of Cotter et al. (2019) (see Appendix C) to find the best stochastic classifier supported on the sequence of 100 iterates. |