Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization
Authors: Aniket Das, Bernhard Schölkopf, Michael Muehlebach
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our theoretical results by benchmarking on finite-sum SC-SC quadratic minimax games. This class of problems appears in several applications such as reinforcement learning [11], robust regression, [12] and online learning [26]. The objective F and the components fi are given by: F(x, y) = 1/n Pn i=1 fi(x, y) = 1/2x T Ax + x T By 1/2y T Cy u T i x v T i y, fi(x, y) = 1/2x T Aix + x T Biy 1/2y T Ciy u T i x v T i y, where A and C are strictly positive definite. We generate the components fi randomly, such that Pn i=1 ui = Pn i=1 vi = 0 and the expected singular values of B are larger than that of A and C. This ensures that the bilinear coupling term x T By is sufficiently strong, since a weak coupling practically reduces to quadratic minimization, which has already been investigated in prior works. Finally, to investigate how the presence of nonconvex-nonconcave components impacts convergence, a few randomly chosen fi s are allowed to be nonconvex-nonconcave quadratics. For each algorithm analyzed in the text, we benchmark sampling without replacement against uniform sampling by running each method for 100 epochs using constant step-sizes that are selected independently for each method via grid search. Further details regarding the setup is discussed in Appendix G. We present our results in Figure 2, where we plot the relative distance of the epoch iterates from the minimax point, defined as |zk 0 z |2/|z0 z |2, averaged over 50 independent runs. In agreement with our theoretical findings, sampling without replacement consistently outperforms uniform sampling. |
| Researcher Affiliation | Collaboration | Aniket Das Indian Institute of Technology Kanpur aniketd@iitk.ac.in Bernhard Schölkopf Max Planck Institute for Intelligent Systems bs@tuebingen.mpg.de Michael Muehlebach Max Planck Institute for Intelligent Systems michaelm@tuebingen.mpg.de Currently at Google Research. Contact: ketd@google.com |
| Pseudocode | Yes | Algorithm 1: GDA-RR/SO/AS; Algorithm 2: PPM-RR/SO/AS; Algorithm 3: AGDA-RR; Algorithm 4: AGDA-AS |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] They are included in the supplementary material |
| Open Datasets | No | The paper states: 'We generate the components fi randomly, such that Pn i=1 ui = Pn i=1 vi = 0 and the expected singular values of B are larger than that of A and C.' This indicates that the authors generated their own synthetic data for the experiments rather than using a publicly available or open dataset. No specific public dataset or access information is provided. |
| Dataset Splits | No | The paper mentions 'running each method for 100 epochs' and averaging results over '50 runs' or '20 independently sampled quadratic games,' but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts. The data used for experiments is generated randomly, so traditional splits are not applicable in the same way. |
| Hardware Specification | No | The paper mentions in the checklist that 'compute and the type of resources used' details are 'provided in the appendix,' but the main paper does not specify any particular GPU models, CPU models, or other hardware specifications. Without access to the appendix, this information is not available in the main text. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | For each algorithm analyzed in the text, we benchmark sampling without replacement against uniform sampling by running each method for 100 epochs using constant step-sizes that are selected independently for each method via grid search. Further details regarding the setup is discussed in Appendix G. |