On the Last-Iterate Convergence of Shuffling Gradient Methods

Authors: Zijian Liu, Zhengyuan Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing lastiterate lower bounds or are as fast as the previous best upper bounds for the average iterate.
Researcher Affiliation Collaboration 1Stern School of Business, New York University 2Arena Technologies. Correspondence to: Zijian Liu <zl3067@stern.nyu.edu>, Zhengyuan Zhou <zhengyuanzhou24@gmail.com>.
Pseudocode Yes Algorithm 1 Proximal Shuffling Gradient Method Input: initial point x1 domψ, stepsize ηk > 0. for k = 1 to K do Generate a permutation σi k : i [n] of [n] x1 k = xk for i = 1 to n do xi+1 k = xi k ηk fσi k(xi k) xk+1 = argminx Rdnψ(x) + x xn+1 k 2 2ηk Return x K+1
Open Source Code No The paper does not provide a link or an explicit statement about the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and focuses on convergence rates and proofs, not empirical evaluation on specific datasets. Therefore, it does not mention or provide access to any training datasets.
Dataset Splits No The paper is theoretical and does not involve empirical validation with dataset splits. Thus, it does not provide details on training/validation/test splits.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe any experiments requiring specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and convergence rates, not an experimental setup with hyperparameters or training configurations.