reproducibilityindex.ai

Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Authors: Hongseok Namkoong, John C. Duchi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experimental results demonstrating the efﬁciency of our algorithm. We ﬁrst compare our method with existing algorithms for solving the robust problem (1) on a synthetic dataset, then investigating the robust formulation on real datasets to show how the calibrated conﬁdence guarantees behave in practice, especially in comparison to the ERM. We experiment on natural high dimensional datasets as well as those with many training examples. Our implementation uses the efﬁcient updates outlined in Section 4. Throughout our experiments, we use the best tuned step sizes for all methods. For the ﬁrst two experiments, we set = χ2 1,.9 so that the resulting robust objective (1) will be a calibrated 95% upper conﬁdence bound on the optimal population risk. For our last experiment, the asymptotic regime (3) fails to hold due to the high dimensional nature of the problem, so we choose = 50 (somewhat arbitrarily, but other give similar behavior). We take X = x 2 Rd : kxk2 R for our experiments.
Researcher Affiliation	Academia	Hongseok Namkoong Stanford University hnamk@stanford.edu John C. Duchi Stanford University jduchi@stanford.edu
Pseudocode	Yes	Algorithm 1 Two-player Bandit Mirror Descent 1: Input: Stepsize x, p > 0, initialize: x1 2 X, p1 = /n 2: for t = 1, 2, . . . , T do 3: Sample It pt, that is, set It = i with probability pt,i 4: Compute estimated loss for i 2 [n]: b t,i(x) = i(x) pi,t 1 {It = i} 5: Update p: wt+1 r p(r p(pt) + pb t(xt)), pt+1 argminp2P ,n B p(p, wt+1) 6: Update x: yt+1 r x ( x(xt) xg It(xt)), xt+1 argminx2X B x(x, yt+1) 7: end for
Open Source Code	No	The paper does not provide a direct link to open-source code or explicitly state its availability.
Open Datasets	Yes	We also perform experiments on two datasets with larger n: the Adult dataset [22] and the Reuters RCV1 Corpus [21].
Dataset Splits	No	The Adult dataset has n = 32,561 training and 16,281 test examples... For the Reuters RCV1 Corpus... randomly split the 804,410 examples into 723,969 training (90% of data) and 80,441 (10% of data) test examples. No explicit mention of a validation set is made.
Hardware Specification	No	The paper only mentions 'CPU time' and does not provide specific details on the hardware used, such as GPU/CPU models, memory, or cloud computing instances.
Software Dependencies	No	The paper mentions 'Gurobi solver [17]' but does not provide a specific version number. No other software dependencies are listed with version details.
Experiment Setup	Yes	Throughout our experiments, we use the best tuned step sizes for all methods. For the ﬁrst two experiments, we set = χ2 1,.9 so that the resulting robust objective (1) will be a calibrated 95% upper conﬁdence bound on the optimal population risk. For our last experiment, the asymptotic regime (3) fails to hold due to the high dimensional nature of the problem, so we choose = 50 (somewhat arbitrarily, but other give similar behavior). We take X = x 2 Rd : kxk2 R for our experiments. We use the hinge loss i(x) = + with n = 2000, d = 500 and R = 10 in our experiment. We use binary logistic loss i(x) = log(1 + exp( bia> i x)).