Two-timescale Extragradient for Finding Local Minimax Points
Authors: Jiseok Chae, Kyuwon Kim, Donghwan Kim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work provably improves upon all previous results on finding local minimax points, by eliminating a crucial assumption that the Hessian with respect to the maximization variable is nondegenerate. In Section 3, we derive a second order characterization of local minimax points, without assuming that yyf is nondegenerate. In Section 4, we develop new tools to achieve the main results in Section 5. These tools include a new spectral analysis that does not rely on the nondegeneracy assumption on yyf, as well as the concept of hemicurvature to better understand the behavior of eigenvalues. In Section 5, from a dynamical system perspective, we show that the limit points of the two-timescale EG in the continuous time limit are the local minimax points under mild conditions, by establishing a second order property of those points. This continuous-time analysis is then used to derive a similar conclusion in the discrete-time case. In the discretetime case, Section 5.4 further shows that two-timescale EG almost surely avoids (undesirable) strict non-minimax points, as desired, while Section 4.3 shows that two-timescale GDA may avoid (desirable) local minimax points where yyf is degenerate. In Section 6, we extend the local result of Section 5 to a global statement: under the Minty variational inequality (MVI) condition (Minty, 1967) and additional mild conditions, the two-timescale EG globally converges to local minimax points. |
| Researcher Affiliation | Academia | Jiseok Chae, Kyuwon Kim & Donghwan Kim Department of Mathematical Sciences, KAIST {jsch,kkw4053,donghwankim}@kaist.ac.kr |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Algorithms like GDA and EG are described mathematically in text, but not in a pseudocode format. |
| Open Source Code | No | The paper does not provide any concrete access to source code, such as a repository link or an explicit code release statement. |
| Open Datasets | No | This paper is theoretical and does not conduct experiments involving datasets. Therefore, no information about publicly available training data is provided. |
| Dataset Splits | No | This paper is theoretical and does not conduct experiments involving dataset splits. Therefore, no information about validation splits is provided. |
| Hardware Specification | No | This paper is theoretical and does not conduct experiments. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | This paper is theoretical and does not conduct experiments. Therefore, no specific ancillary software details with version numbers are provided. |
| Experiment Setup | No | This paper is theoretical and does not conduct experiments. Therefore, no specific experimental setup details, such as hyperparameter values or training configurations, are provided. |