Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization
Authors: Zhengmian Hu, Xidong Wu, Heng Huang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Thus, an analysis based on separate positive and negative curvatures is more pertinent. Our experiments show that modern neural networks have very unbalanced positive and negative curvatures, making the Lipschitz smoothness assumption not tight. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park, MD, USA. 2Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA. |
| Pseudocode | Yes | Algorithm 1 Lookahead Algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We train Res Net-18 on CIFAR-10 (Krizhevsky et al., 2009) with Cutout regularization (De Vries & Taylor, 2017) for Figure H.1. |
| Dataset Splits | No | We train Res Net-18 on CIFAR-10 (Krizhevsky et al., 2009) with Cutout regularization (De Vries & Taylor, 2017) for Figure H.1. We use SGD with initial step size γ = 0.1, Lookahead with τ = 5 and same initial step size, and Lookahead with constant horizon with same initial τ and γ. |
| Hardware Specification | Yes | This experiment takes less than 2 hours on a NVIDIA Titan Xp graphic card. These two images takes less than 4 minutes to generate on a NVIDIA RTX A5000 graphic card. |
| Software Dependencies | No | An open-source framework to compute Hessian information for DNNs by power iteration and stochastic Lanczos method was developed in (Yao et al., 2020). |
| Experiment Setup | Yes | We use SGD with initial step size γ = 0.1, Lookahead with τ = 5 and same initial step size... At epochs 60, 120, 160, step size γ for all algorithm decreases by 5 times. For Lookahead with constant horizon, the τ also increase 5 times at these epochs. Thus after 160 epochs, Lookahead with constant horizon uses τ = 625. The network is initialized with Kaiming initialization. |