Drago: Primal-Dual Coupled Variance Reduction for Faster Distributionally Robust Optimization
Authors: Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The theoretical results are supported by numerical benchmarks on regression and classification tasks. |
| Researcher Affiliation | Academia | 1University of Washington, Seattle 2University of Wisconsin, Madison |
| Pseudocode | Yes | Algorithm 1 Distributionally Robust Annular Gradient Optimizer (DRAGO) |
| Open Source Code | Yes | The code to reproduce these experiments can be found at https://github.com/ronakdm/drago. |
| Open Datasets | Yes | We consider regression and classification tasks. Letting (xi, yi) denote a feature-label pair, we have that each ℓi represents the squared error loss or multinomial cross-entropy loss, given by ... yacht (n = 244, d = 6) [Tsanas and Xifara, 2012], energy (n = 614, d = 8) [Baressi Segota et al., 2020], concrete (n = 824, d = 8) [Yeh, 2006], acsincome (n = 4000, d = 202) [Ding et al., 2021], kin8nm (n = 6553, d = 8) [Akujuobi and Zhang, 2017], and power (n = 7654, d = 4) [T ufekci, 2014]. |
| Dataset Splits | Yes | In practice, the regularization parameter µ and shift cost ν are tuned by a statistical metric, i.e. generalization error as measured on a validation set. |
| Hardware Specification | Yes | Experiments were run on a CPU workstation with an Intel i9 processor, a clock speed of 2.80GHz, 32 virtual cores, and 126G of memory. |
| Software Dependencies | No | The paper mentions 'Python 3' and 'Numba packages' for just-in-time compilation, and that algorithms are 'primarily written in PyTorch'. However, specific version numbers are not provided for Numba or PyTorch, making the description not fully reproducible for ancillary software. |
| Experiment Setup | Yes | We fix µ = 1 but vary ν to study its role as a conditioning parameter... We fix a minibatch size of 64 SGD and an epoch length of N = n for LSVRG. For DRAGO, we investigate the variants in which b is set to 1 and b = n/d a priori, as well as cases when b is a tuned hyperparameter... The learning rate η is chosen in the set {1 10 4, 3 10 4, 1 10 3, 3 10 3, 1 10 2, 3 10 2, 1 10 1, 3 10 1, 1 100, 3 100}, with two orders of magnitude lower numbers used in acsincome due to its sparsity. |