reproducibilityindex.ai

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Authors: Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁnally verify our theoretical results in a number of tasks and ﬁnd that the proposed algorithm can consistently achieve prominent acceleration. ... We perform two sets of experiments to verify our theoretical results. In the ﬁrst set of experiments, we consider the setting in Section 3.2, where the loss ℓ(x; ξ) is highly non-convex and unbounded, and ψ is chosen to be the commonly used χ2-divergence such that its conjugate is smooth. We will show that (i) the vanilla SGD algorithm cannot optimize this loss efﬁciently due to the non-smoothness of the DRO objective; (ii) by simply adopting the normalized momentum algorithm, the optimization process can be greatly accelerated. In the second set of experiments, we deal with the CVa R setting in Section 3.4. We will show that by employing the smooth approximation of CVa R deﬁned in (11) and (12), the optimization can be greatly accelerated. ... Figure 1: Training curve of χ2 penalized DRO and smoothed CVa R in regression and classiﬁcation task. Table 2: Test performance of the χ2 penalized DRO problem for unbalanced CIFAR-10 classiﬁcation.
Researcher Affiliation	Academia	1School of Mathematical Sciences, Peking University 2Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 3Center of Data Science, Peking University 4Institute for Artiﬁcial Intelligence, Peking Unviersity
Pseudocode	Yes	Algorithm 1: Mini-batch Normalized SGD with Momentum
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	Datasets. We choose the AFAD-Full dataset for regression and CIFAR-10 dataset for classiﬁcation. ... AFAD-Full [Niu et al., 2016] is a regression task... CIFAR-10 dataset is a classiﬁcation task... we adopt the setting in Chou et al. [2020] to construct an imbalanced CIFAR-10 by randomly sampling each category at different ratio.
Dataset Splits	No	We split the whole dataset into a training set comprised of 80% data and a test set comprised of the remaining 20% data. The paper does not explicitly mention a validation set split.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions using ResNet-18 model but does not specify any software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup	Yes	We choose the penalty coefﬁcient λ = 0.1 and the CVa R coefﬁcient α = 0.02 in all experiments. For each algorithm, we tune the learning rate hyper-parameter from a grid search and pick the one that achieves the fastest optimization speed. The momentum factor is taken to 0.9 in all experiments, and the mini-batch size is chosen to be 128. We train the model for 100 epochs on CIFAR-10 dataset and 200 epochs on AFAD-Full dataset.