Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Authors: Hongseok Namkoong, John C. Duchi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental results demonstrating the efficiency of our algorithm. We first compare our method with existing algorithms for solving the robust problem (1) on a synthetic dataset, then investigating the robust formulation on real datasets to show how the calibrated confidence guarantees behave in practice, especially in comparison to the ERM. We experiment on natural high dimensional datasets as well as those with many training examples. Our implementation uses the efficient updates outlined in Section 4. Throughout our experiments, we use the best tuned step sizes for all methods. For the first two experiments, we set = χ2 1,.9 so that the resulting robust objective (1) will be a calibrated 95% upper confidence bound on the optimal population risk. For our last experiment, the asymptotic regime (3) fails to hold due to the high dimensional nature of the problem, so we choose = 50 (somewhat arbitrarily, but other give similar behavior). We take X = x 2 Rd : kxk2 R for our experiments.
Researcher Affiliation Academia Hongseok Namkoong Stanford University hnamk@stanford.edu John C. Duchi Stanford University jduchi@stanford.edu
Pseudocode Yes Algorithm 1 Two-player Bandit Mirror Descent 1: Input: Stepsize x, p > 0, initialize: x1 2 X, p1 = /n 2: for t = 1, 2, . . . , T do 3: Sample It pt, that is, set It = i with probability pt,i 4: Compute estimated loss for i 2 [n]: b t,i(x) = i(x) pi,t 1 {It = i} 5: Update p: wt+1 r p(r p(pt) + pb t(xt)), pt+1 argminp2P ,n B p(p, wt+1) 6: Update x: yt+1 r x ( x(xt) xg It(xt)), xt+1 argminx2X B x(x, yt+1) 7: end for
Open Source Code No The paper does not provide a direct link to open-source code or explicitly state its availability.
Open Datasets Yes We also perform experiments on two datasets with larger n: the Adult dataset [22] and the Reuters RCV1 Corpus [21].
Dataset Splits No The Adult dataset has n = 32,561 training and 16,281 test examples... For the Reuters RCV1 Corpus... randomly split the 804,410 examples into 723,969 training (90% of data) and 80,441 (10% of data) test examples. No explicit mention of a validation set is made.
Hardware Specification No The paper only mentions 'CPU time' and does not provide specific details on the hardware used, such as GPU/CPU models, memory, or cloud computing instances.
Software Dependencies No The paper mentions 'Gurobi solver [17]' but does not provide a specific version number. No other software dependencies are listed with version details.
Experiment Setup Yes Throughout our experiments, we use the best tuned step sizes for all methods. For the first two experiments, we set = χ2 1,.9 so that the resulting robust objective (1) will be a calibrated 95% upper confidence bound on the optimal population risk. For our last experiment, the asymptotic regime (3) fails to hold due to the high dimensional nature of the problem, so we choose = 50 (somewhat arbitrarily, but other give similar behavior). We take X = x 2 Rd : kxk2 R for our experiments. We use the hinge loss i(x) = + with n = 2000, d = 500 and R = 10 in our experiment. We use binary logistic loss i(x) = log(1 + exp( bia> i x)).