Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Authors: Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally verify our theoretical results in a number of tasks and find that the proposed algorithm can consistently achieve prominent acceleration. ... We perform two sets of experiments to verify our theoretical results. In the first set of experiments, we consider the setting in Section 3.2, where the loss ℓ(x; ξ) is highly non-convex and unbounded, and ψ is chosen to be the commonly used χ2-divergence such that its conjugate is smooth. We will show that (i) the vanilla SGD algorithm cannot optimize this loss efficiently due to the non-smoothness of the DRO objective; (ii) by simply adopting the normalized momentum algorithm, the optimization process can be greatly accelerated. In the second set of experiments, we deal with the CVa R setting in Section 3.4. We will show that by employing the smooth approximation of CVa R defined in (11) and (12), the optimization can be greatly accelerated. ... Figure 1: Training curve of χ2 penalized DRO and smoothed CVa R in regression and classification task. Table 2: Test performance of the χ2 penalized DRO problem for unbalanced CIFAR-10 classification.
Researcher Affiliation Academia 1School of Mathematical Sciences, Peking University 2Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 3Center of Data Science, Peking University 4Institute for Artificial Intelligence, Peking Unviersity
Pseudocode Yes Algorithm 1: Mini-batch Normalized SGD with Momentum
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes Datasets. We choose the AFAD-Full dataset for regression and CIFAR-10 dataset for classification. ... AFAD-Full [Niu et al., 2016] is a regression task... CIFAR-10 dataset is a classification task... we adopt the setting in Chou et al. [2020] to construct an imbalanced CIFAR-10 by randomly sampling each category at different ratio.
Dataset Splits No We split the whole dataset into a training set comprised of 80% data and a test set comprised of the remaining 20% data. The paper does not explicitly mention a validation set split.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions using ResNet-18 model but does not specify any software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup Yes We choose the penalty coefficient λ = 0.1 and the CVa R coefficient α = 0.02 in all experiments. For each algorithm, we tune the learning rate hyper-parameter from a grid search and pick the one that achieves the fastest optimization speed. The momentum factor is taken to 0.9 in all experiments, and the mini-batch size is chosen to be 128. We train the model for 100 epochs on CIFAR-10 dataset and 200 epochs on AFAD-Full dataset.