Towards Sharper Risk Bounds for Minimax Problems

Authors: Bowei Zhu, Shaojie Li, Yong Liu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in stronglyconvex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.
Researcher Affiliation Academia Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Pseudocode Yes Algorithm 1 Two-timescale GDA for minimax problem; Algorithm 2 Two-timescale SGDA for minimax problem
Open Source Code No The paper does not contain any statements about releasing code, nor does it provide links to a code repository for the methodology described.
Open Datasets No The paper is theoretical and focuses on bounds and algorithms, rather than conducting empirical experiments with datasets. Therefore, it does not mention specific training datasets or their public availability.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with data, thus no training, validation, or test splits are discussed.
Hardware Specification No The paper is theoretical and does not describe any empirical experiments, hence no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not include details about specific software dependencies or version numbers required to reproduce any empirical experiments.
Experiment Setup No The paper is theoretical and does not describe any empirical experimental setup details, hyperparameters, or training configurations.