Competitive Gradient Optimization
Authors: Abhijeet Vyas, Brian Bullins, Kamyar Azizzadenesheli
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a continuous-time analysis of CGO and its convergence properties while showing that in the continuous limit, previous methods degenerate to their gradient descent ascent (GDA) variants. We further provide a rate of convergence to stationary points in the discrete-time setting. We propose a generalized class of α-coherent functions and show that for strictly αcoherent functions, CGO ensures convergence to a saddle point. Moreover, we propose optimistic CGO (o CGO), an optimistic variant, for which we show a convergence rate of O( 1/n) to saddle points for α-coherent functions. (...) Section 5.3. Simulation of CGO and o CGO on families from Section 4 We now evaluate the performance1 of CGO and o CGO on families discussed in examples (4.1) and (4.2). We first consider a function f(x, y) = x Ay, A Rm n, x Rm, y Rn . We sample all the entries of A independently from a standard Gaussian, A = (aij), aij N(0, 1). We consider the plot of the L2 norm of x vs. that of y, since the only saddle point is the origin, the desired solution is x 2, y 2 0. We plot the iterates of CGO and o CGO for different α, and observe that o CGO converges to the saddle point for α 0 (at a very slow rate for α = 0) while CGO does so for α > 0. The results at α = 0 are that of GDA and optimistic GDA. We see similar results for the case where A is the scalar 1, i.e. f(x, y) = xy. This is in accordance with the analysis in example (4.1). The code for the experiments is available at Link to code. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Purdue University, West Lafayette, IN 47907 2Nvidia, Santa Clara, CA 95050. Correspondence to: Abhijeet Vyas <vyas26@purdue.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the update rules using mathematical equations and text. |
| Open Source Code | Yes | The code for the experiments is available at Link to code |
| Open Datasets | No | The paper uses mathematical functions for simulations, defined within the paper (e.g., 'f(x, y) = x Ay', 'f(x, y) = xy', 'f(x, y) = k/2(x^2-y^2)+xy', 'f(x, y) = x^2y+kxy'). It does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper simulates algorithms on mathematical functions and does not discuss dataset splits (training, validation, test) in terms of percentages, counts, or references to predefined splits, as it's not a dataset-based evaluation in the typical sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud computing instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'), needed to replicate the experiment. |
| Experiment Setup | Yes | We plot the iterates of CGO and o CGO for different α, and observe that o CGO converges to the saddle point for α 0 (at a very slow rate for α = 0) while CGO does so for α > 0. (...) We plot the iterates of CGO and o CGO for different α, and observe that o CGO converges to the saddle point for α 0 (at a very slow rate for α = 0) while CGO does so for α > 0. The results at α = 0 are that of GDA and optimistic GDA. We see similar results for the case where A is the scalar 1, i.e. f(x, y) = xy. This is in accordance with the analysis in example (4.1). (...) Figure 2. CGO and optimistic CGO on bilinear functions f(x, y) = xy, (x, y) R2 : (a,b) and f(x, y) = x Ay, x R4, y R5 : (c,d) for 100 iterations. (...) Figure 3. CGO and optimistic CGO on functions from the family f(x, y) = k/2 (x2 y2) xy. k = 2 : (a,b) and k = 2 : (c,d) for 100 iterations. (...) Figure 4. CGO and optimistic CGO on the function f(x, y) = x2y + xy from multiple initializations for 500 iterations with increasing α. (...) Figure 5. CGO and optimistic CGO on the function f(x, y) = x2y from multiple initializations for 500 iterations with increasing α. |