Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
Authors: Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we perform experiments demonstrating that our two-dataset approach successfully improves constraint generalization even when our theorems do not hold. In other words, providing independent datasets to each player works well as a heuristic for improving constraint generalization. |
| Researcher Affiliation | Collaboration | 1Google AI, Mountain View, CA, USA 2Toyota Technological Institute at Chicago, Chicago, IL, USA 3Cornell University, Computer Science Department, Ithaca, NY, USA 4Kakao Mobility, Seongnam-si, Geyonggi-do, South Korea. |
| Pseudocode | Yes | Algorithm 1 Finds an approximate equilibrium of the empirical proxy-Lagrangian game (Definition 1).... |
| Open Source Code | No | Our implementation uses TensorFlow, and is based on Cotter et al. (2019)’s open-source constrained optimization library. |
| Open Datasets | Yes | Communities and Crime: This UCI dataset (Dheeru & Karra Taniskidou, 2017) includes features aggregated from census and law enforcement data... |
| Dataset Splits | Yes | For the two-dataset experiments, the training set was randomly permuted and split in half, into S(trn) and S(val). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'ADAM' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To avoid a hyperparameter search, we replace the stochastic gradient updates of Algorithms 3 and 4 with ADAM (Kingma & Ba, 2014), using the default parameters. For both our two-dataset algorithm and the one-dataset baseline, the result of training is a sequence of iterates θ(1), . . . , θ(T ), but instead of keeping track of the full sequence, we only store a total of 100 evenly-spaced iterates for each run. Rather than using the weighted predictor of Theorems 1 and 2, we use the shrinking procedure of Cotter et al. (2019) (see Appendix C) to find the best stochastic classifier supported on the sequence of 100 iterates. |