Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization

Authors: Zhengmian Hu, Xidong Wu, Heng Huang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Thus, an analysis based on separate positive and negative curvatures is more pertinent. Our experiments show that modern neural networks have very unbalanced positive and negative curvatures, making the Lipschitz smoothness assumption not tight.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park, MD, USA. 2Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA.
Pseudocode Yes Algorithm 1 Lookahead Algorithm
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We train Res Net-18 on CIFAR-10 (Krizhevsky et al., 2009) with Cutout regularization (De Vries & Taylor, 2017) for Figure H.1.
Dataset Splits No We train Res Net-18 on CIFAR-10 (Krizhevsky et al., 2009) with Cutout regularization (De Vries & Taylor, 2017) for Figure H.1. We use SGD with initial step size γ = 0.1, Lookahead with τ = 5 and same initial step size, and Lookahead with constant horizon with same initial τ and γ.
Hardware Specification Yes This experiment takes less than 2 hours on a NVIDIA Titan Xp graphic card. These two images takes less than 4 minutes to generate on a NVIDIA RTX A5000 graphic card.
Software Dependencies No An open-source framework to compute Hessian information for DNNs by power iteration and stochastic Lanczos method was developed in (Yao et al., 2020).
Experiment Setup Yes We use SGD with initial step size γ = 0.1, Lookahead with τ = 5 and same initial step size... At epochs 60, 120, 160, step size γ for all algorithm decreases by 5 times. For Lookahead with constant horizon, the τ also increase 5 times at these epochs. Thus after 160 epochs, Lookahead with constant horizon uses τ = 625. The network is initialized with Kaiming initialization.