Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Authors: Qi Zhang, Yi Zhou, Ashley Prater-Bennette, Lixin Shen, Shaofeng Zou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Numerical Results In this section, we verify our theoretical results in solving an imbalanced classification problem. In the experiment, we consider a non-convex loss function and k is set to be 2 for the Cressie-Read family. We will show that 1) to optimize the same dual objective function, our proposed algorithm converges faster than the general Proximal Gradient Descent(PGD) algorithm (Ghadimi, Lan, and Zhang 2016); 2) The performance proposed algorithm for the constrained DRO problem outperforms or is close to the performance of the penalized DRO with respect to the worst classes. Both of them outperform the baseline. Tasks. We conduct experiments on the imbalanced CIFAR10 dataset...
Researcher Affiliation Collaboration 1University at Buffalo 2The University of Utah 3Air Force Research Laboratory 4Syracuse University
Pseudocode Yes Algorithm 1: SFK-DRO Input: Iteration number K, initial point (x1, z1), sample numbers nx, nz, step size , and one constant C 1: Let t = 1 2: while t K do...
Open Source Code No The paper does not provide any specific links to source code or explicitly state that the code is publicly available or included in supplementary materials.
Open Datasets Yes We conduct experiments on the imbalanced CIFAR10 dataset, following the experimental setting in (Jin et al. 2021; Chou et al. 2020). The original CIFAR-10 test dataset consists of 10 classes, where each of the classes has 5000 images.
Dataset Splits No We randomly select training samples from the original set for each class with the following sampling ratio: {0.804, 0.543, 0.997, 0.593, 0.390, 0.285, 0.959, 0.806, 0.967, 0.660}. We keep the test dataset unchanged.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Training Details. We set λ1 = 1, 1 = 0, λ0 = 0.1, = 10, and the upper bounds λ = 10, B = 10. To achieve a faster optimization rate, we set the learning rate = 0.01 before the first 40 epochs and = 0.001 after. The minibatch size is chosen to be 128.