Tuning-Free Stochastic Optimization

Authors: Ahmed Khaled, Chi Jin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that for the task of minimizing a convex and smooth or Lipschitz function over an unbounded domain, tuning-free optimization is impossible. We discuss conditions under which tuning-free optimization is possible even over unbounded domains. In particular, we show that the recently proposed Do G and Do WG algorithms are tuning-free when the noise distribution is sufficiently well-behaved. For the task of finding a stationary point of a smooth and potentially nonconvex function, we give a variant of SGD that matches the best-known high-probability convergence rate for tuned SGD at only an additional polylogarithmic cost. However, we also give an impossibility result that shows no algorithm can hope to match the optimal expected convergence rate for tuned SGD with high probability.
Researcher Affiliation Academia Ahmed Khaled 1 Chi Jin 1 1Electrical and Computer Engineering Department, Princeton University, Princeton, NJ, USA.
Pseudocode Yes Algorithm 1 Restarted SGD, Algorithm 2 T-Do G + Variance Estimation, Algorithm 3 T-Do WG + Variance Estimation, Algorithm 4 Find Leader(S, δ, K)
Open Source Code No The paper does not include any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and does not conduct experiments involving datasets.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore no dataset splits for validation are mentioned.
Hardware Specification No The paper is theoretical and does not report experimental results that would require specific hardware specifications.
Software Dependencies No The paper is theoretical and does not report experimental results that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations.