reproducibilityindex.ai

Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization

Authors: Zhengmian Hu, Xidong Wu, Heng Huang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Thus, an analysis based on separate positive and negative curvatures is more pertinent. Our experiments show that modern neural networks have very unbalanced positive and negative curvatures, making the Lipschitz smoothness assumption not tight.
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park, MD, USA. 2Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA.
Pseudocode	Yes	Algorithm 1 Lookahead Algorithm
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	We train Res Net-18 on CIFAR-10 (Krizhevsky et al., 2009) with Cutout regularization (De Vries & Taylor, 2017) for Figure H.1.
Dataset Splits	No	We train Res Net-18 on CIFAR-10 (Krizhevsky et al., 2009) with Cutout regularization (De Vries & Taylor, 2017) for Figure H.1. We use SGD with initial step size γ = 0.1, Lookahead with τ = 5 and same initial step size, and Lookahead with constant horizon with same initial τ and γ.
Hardware Specification	Yes	This experiment takes less than 2 hours on a NVIDIA Titan Xp graphic card. These two images takes less than 4 minutes to generate on a NVIDIA RTX A5000 graphic card.
Software Dependencies	No	An open-source framework to compute Hessian information for DNNs by power iteration and stochastic Lanczos method was developed in (Yao et al., 2020).
Experiment Setup	Yes	We use SGD with initial step size γ = 0.1, Lookahead with τ = 5 and same initial step size... At epochs 60, 120, 160, step size γ for all algorithm decreases by 5 times. For Lookahead with constant horizon, the τ also increase 5 times at these epochs. Thus after 160 epochs, Lookahead with constant horizon uses τ = 625. The network is initialized with Kaiming initialization.