Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems

Authors: Feihu Huang, Xidong Wu, Heng Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct the experiments on fair classifier and robust neural network training tasks to demonstrate the efficiency of our new algorithms.
Researcher Affiliation Academia Feihu Huang, Xidong Wu, Heng Huang Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA huangfeihu2018@gmail.com, xidong-wu@pitt.edu, heng.huang@pitt.edu
Pseudocode Yes Algorithm 1 (Stochastic) Mirror Descent Ascent Algorithm; Algorithm 2 Accelerated Stochastic Mirror Descent Ascent (VR-SMDA) Algorithm
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Fashion-MNIST dataset and MNIST dataset consist of 28 28 arrays of grayscale pixel images classified into 10 categories, and includes 60, 000 training images and 10, 000 testing images. CIFAR-10 dataset includes 60, 000 32 32 colour images (50, 000 training images and 10, 000 testing images).
Dataset Splits No The paper provides training and testing image counts but does not explicitly mention validation splits or counts.
Hardware Specification Yes The experiments are run on CPU machines with 2.3 GHz Intel Core i9 as well as NVIDIA Tesla P40 GPU.
Software Dependencies No The paper does not specify software versions for any ancillary software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For fair comparison, we use the same step size for all methods. Specifically, step-size for w is 0.001 and step-size for y is 0.00001. We apply xavier normal initialization to CNN layer. In our algorithms, we choose the mirror functions ψt(w) = 1 2w T Htw and φt(y) = 1 2y T Gty for all t 1, where Ht and Gt are generated from (12) and (13) respectively, given α = 0.1 and ρ = 0.00005. We set η = ηt = 1 in our algorithms. We run all deterministic algorithms for 1000 seconds and all stochastic algorithms for 50 epochs. Then we record the loss value. For stochastic methods, batch sizes of PASGDA and SMDA are 3000. For our VR-SMDA, we set the large batch size b = 60000 and the mini-batch size b1 = q = 3000. In the experiment, we set ν1 = 0.0001 and ν2 = 0.1 in the above problem (31). In the above problem (30), we set K = 5. For fair comparison, we use the same step size for all methods. Specifically, step-size for w is 0.0005 and step-size for u is 0.00001. We set η = ηt = 1 in our algorithms. For our algorithms, we choose the mirror functions ψt(w) = 1 2w T Htw and φt(u) = 1 2u T Gtu for all t 1, where Ht and Gt are generated from (12) and (13) respectively, given α = 0.1 and ρ = 0.0005. Here we only conduct experiments with stochastic methods, and batch-sizes of PASGDA and SMDA are 600. For our VR-SMDA, we set b = 1200 and b1 = q = 600. Following [38], we set ε = 0.4 in the above problem (28).