Competitive Gradient Optimization

Authors: Abhijeet Vyas, Brian Bullins, Kamyar Azizzadenesheli

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a continuous-time analysis of CGO and its convergence properties while showing that in the continuous limit, previous methods degenerate to their gradient descent ascent (GDA) variants. We further provide a rate of convergence to stationary points in the discrete-time setting. We propose a generalized class of α-coherent functions and show that for strictly αcoherent functions, CGO ensures convergence to a saddle point. Moreover, we propose optimistic CGO (o CGO), an optimistic variant, for which we show a convergence rate of O( 1/n) to saddle points for α-coherent functions. (...) Section 5.3. Simulation of CGO and o CGO on families from Section 4 We now evaluate the performance1 of CGO and o CGO on families discussed in examples (4.1) and (4.2). We first consider a function f(x, y) = x Ay, A Rm n, x Rm, y Rn . We sample all the entries of A independently from a standard Gaussian, A = (aij), aij N(0, 1). We consider the plot of the L2 norm of x vs. that of y, since the only saddle point is the origin, the desired solution is x 2, y 2 0. We plot the iterates of CGO and o CGO for different α, and observe that o CGO converges to the saddle point for α 0 (at a very slow rate for α = 0) while CGO does so for α > 0. The results at α = 0 are that of GDA and optimistic GDA. We see similar results for the case where A is the scalar 1, i.e. f(x, y) = xy. This is in accordance with the analysis in example (4.1). The code for the experiments is available at Link to code.
Researcher Affiliation Collaboration 1Department of Computer Science, Purdue University, West Lafayette, IN 47907 2Nvidia, Santa Clara, CA 95050. Correspondence to: Abhijeet Vyas <vyas26@purdue.edu>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the update rules using mathematical equations and text.
Open Source Code Yes The code for the experiments is available at Link to code
Open Datasets No The paper uses mathematical functions for simulations, defined within the paper (e.g., 'f(x, y) = x Ay', 'f(x, y) = xy', 'f(x, y) = k/2(x^2-y^2)+xy', 'f(x, y) = x^2y+kxy'). It does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset.
Dataset Splits No The paper simulates algorithms on mathematical functions and does not discuss dataset splits (training, validation, test) in terms of percentages, counts, or references to predefined splits, as it's not a dataset-based evaluation in the typical sense.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud computing instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'), needed to replicate the experiment.
Experiment Setup Yes We plot the iterates of CGO and o CGO for different α, and observe that o CGO converges to the saddle point for α 0 (at a very slow rate for α = 0) while CGO does so for α > 0. (...) We plot the iterates of CGO and o CGO for different α, and observe that o CGO converges to the saddle point for α 0 (at a very slow rate for α = 0) while CGO does so for α > 0. The results at α = 0 are that of GDA and optimistic GDA. We see similar results for the case where A is the scalar 1, i.e. f(x, y) = xy. This is in accordance with the analysis in example (4.1). (...) Figure 2. CGO and optimistic CGO on bilinear functions f(x, y) = xy, (x, y) R2 : (a,b) and f(x, y) = x Ay, x R4, y R5 : (c,d) for 100 iterations. (...) Figure 3. CGO and optimistic CGO on functions from the family f(x, y) = k/2 (x2 y2) xy. k = 2 : (a,b) and k = 2 : (c,d) for 100 iterations. (...) Figure 4. CGO and optimistic CGO on the function f(x, y) = x2y + xy from multiple initializations for 500 iterations with increasing α. (...) Figure 5. CGO and optimistic CGO on the function f(x, y) = x2y from multiple initializations for 500 iterations with increasing α.