reproducibilityindex.ai

On the Last-Iterate Convergence of Shuffling Gradient Methods

Authors: Zijian Liu, Zhengyuan Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing lastiterate lower bounds or are as fast as the previous best upper bounds for the average iterate.
Researcher Affiliation	Collaboration	1Stern School of Business, New York University 2Arena Technologies. Correspondence to: Zijian Liu <zl3067@stern.nyu.edu>, Zhengyuan Zhou <zhengyuanzhou24@gmail.com>.
Pseudocode	Yes	Algorithm 1 Proximal Shuffling Gradient Method Input: initial point x1 domψ, stepsize ηk > 0. for k = 1 to K do Generate a permutation σi k : i [n] of [n] x1 k = xk for i = 1 to n do xi+1 k = xi k ηk fσi k(xi k) xk+1 = argminx Rdnψ(x) + x xn+1 k 2 2ηk Return x K+1
Open Source Code	No	The paper does not provide a link or an explicit statement about the availability of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on convergence rates and proofs, not empirical evaluation on specific datasets. Therefore, it does not mention or provide access to any training datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical validation with dataset splits. Thus, it does not provide details on training/validation/test splits.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and does not describe any experiments requiring specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on mathematical proofs and convergence rates, not an experimental setup with hyperparameters or training configurations.