Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Distributionally Robust Models at Scale via Composite Optimization
Authors: Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, amin karbasi
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets. |
| Researcher Affiliation | Collaboration | Farzin Haddadpour Yale Institute for Network Science Yale University EMAIL Mohammad Mahdi Kamani Wyze Labs Inc. EMAIL Mehrdad Mahdavi Department of Computer Science & Engineering The Pennsylvania State University EMAIL Amin Karbasi Yale Institute for Network Science Yale University EMAIL |
| Pseudocode | Yes | Algorithm 1: Generalized Composite Incremental Variance Reduction (GCIVR (x(0))) |
| Open Source Code | Yes | The code for the experiments is available at this repository. http://github.com/haddadpour/composite_optimization |
| Open Datasets | Yes | In this experiment, we use the Adult dataset (Dua & Graff, 2017), and consider race groups of white , black , and other as protected groups. [...] We train a linear classifier with logistic regression, and report the overall error rate of the classifier, as well as the maximum violation of the fairness constraints (equal opportunity) over true group memberships. [...] We set ϵ = 0.05 and the noise level to 0.3. [...] learn a linear classifier with logistic regression to predict the crime rate for a community on Communities and Crime dataset (Dua & Graff, 2017). [...] For this experiment, we use Microsoft Learning to Rank Dataset (MSLR-WEB10K) (Qin & Liu, 2013), which contains 10K queries and 136 features. |
| Dataset Splits | Yes | We use 1000 queries in the training and 100 queries in the test datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train a linear classifier with logistic regression, and report the overall error rate of the classifier, as well as the maximum violation of the fairness constraints (equal opportunity) over true group memberships. [...] We compare with the unconstrained optimization and Heavily-constrained algorithm with a 2-layer neural network, each with 100 nodes as their Lagrange model, as described in their paper. [...] For this experiment we use a non-convex objective, where the model is a two-layer neural network each with 128 nodes and cross-entropy as the loss function. |