Distributionally Robust Optimization with Bias and Variance Reduction
Authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Zaid Harchaoui
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that Prospect can converge 2-3x faster than baselines such as SGD and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains. |
| Researcher Affiliation | Collaboration | University of Washington, Google DeepMind, Google Research. |
| Pseudocode | Yes | Algorithm 1 Prospect |
| Open Source Code | Yes | The algorithm implementation and data preparation code is made publicly available online: https://github.com/ronakdm/prospect. |
| Open Datasets | Yes | The datasets used are yacht (n = 244) (Tsanas & Xifara, 2012), energy (n = 614) (Baressi Segota et al., 2020), concrete (n = 824) (Yeh, 2006), kin8nm (n = 6553) (Akujuobi & Zhang, 2017), and power (n = 7654) (T ufekci, 2014). |
| Dataset Splits | Yes | The sample sizes, dimensions, and source of the datasets are summarized in Tab. 2, where d refers to the dimension of each φ(xi).... In practice, the regularization parameter µ and shift cost ν are tuned by a statistical metric, i.e. generalization error as measured on a validation set. |
| Hardware Specification | Yes | No GPUs were used in the study; Experiments were run on a CPU workstation with an Intel i9 processor, a clock speed of 2.80GHz, 32 virtual cores, and 126G of memory. |
| Software Dependencies | No | The code used in this project was written in Python 3 using the PyTorch and Numba packages for automatic differentiation and just-in-time compilation, respectively. No specific version numbers for PyTorch or Numba are provided. |
| Experiment Setup | Yes | We fix a minibatch size of 64 SGD and SRDA and an epoch length of N = n for LSVRG... The learning rate η is chosen in the set {1 10 4, 3 10 4, 1 10 3, 3 10 3, 1 10 2, 3 10 2, 1 10 1, 3 10 1, 1 100, 3 100}, with two orders of magnitude lower numbers used in acsincome due to its sparsity. We fix the shift cost ν = 1 and regularization parameter µ = 1/n. |