Distributionally Robust Optimization with Bias and Variance Reduction

Authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Zaid Harchaoui

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that Prospect can converge 2-3x faster than baselines such as SGD and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.
Researcher Affiliation Collaboration University of Washington, Google DeepMind, Google Research.
Pseudocode Yes Algorithm 1 Prospect
Open Source Code Yes The algorithm implementation and data preparation code is made publicly available online: https://github.com/ronakdm/prospect.
Open Datasets Yes The datasets used are yacht (n = 244) (Tsanas & Xifara, 2012), energy (n = 614) (Baressi Segota et al., 2020), concrete (n = 824) (Yeh, 2006), kin8nm (n = 6553) (Akujuobi & Zhang, 2017), and power (n = 7654) (T ufekci, 2014).
Dataset Splits Yes The sample sizes, dimensions, and source of the datasets are summarized in Tab. 2, where d refers to the dimension of each φ(xi).... In practice, the regularization parameter µ and shift cost ν are tuned by a statistical metric, i.e. generalization error as measured on a validation set.
Hardware Specification Yes No GPUs were used in the study; Experiments were run on a CPU workstation with an Intel i9 processor, a clock speed of 2.80GHz, 32 virtual cores, and 126G of memory.
Software Dependencies No The code used in this project was written in Python 3 using the PyTorch and Numba packages for automatic differentiation and just-in-time compilation, respectively. No specific version numbers for PyTorch or Numba are provided.
Experiment Setup Yes We fix a minibatch size of 64 SGD and SRDA and an epoch length of N = n for LSVRG... The learning rate η is chosen in the set {1 10 4, 3 10 4, 1 10 3, 3 10 3, 1 10 2, 3 10 2, 1 10 1, 3 10 1, 1 100, 3 100}, with two orders of magnitude lower numbers used in acsincome due to its sparsity. We fix the shift cost ν = 1 and regularization parameter µ = 1/n.