On the Last-Iterate Convergence of Shuffling Gradient Methods
Authors: Zijian Liu, Zhengyuan Zhou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing lastiterate lower bounds or are as fast as the previous best upper bounds for the average iterate. |
| Researcher Affiliation | Collaboration | 1Stern School of Business, New York University 2Arena Technologies. Correspondence to: Zijian Liu <zl3067@stern.nyu.edu>, Zhengyuan Zhou <zhengyuanzhou24@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Proximal Shuffling Gradient Method Input: initial point x1 domψ, stepsize ηk > 0. for k = 1 to K do Generate a permutation σi k : i [n] of [n] x1 k = xk for i = 1 to n do xi+1 k = xi k ηk fσi k(xi k) xk+1 = argminx Rdnψ(x) + x xn+1 k 2 2ηk Return x K+1 |
| Open Source Code | No | The paper does not provide a link or an explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on convergence rates and proofs, not empirical evaluation on specific datasets. Therefore, it does not mention or provide access to any training datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation with dataset splits. Thus, it does not provide details on training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe any experiments requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and convergence rates, not an experimental setup with hyperparameters or training configurations. |