reproducibilityindex.ai

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Authors: Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conﬁrm the superiority of clipping-based methods in deep learning tasks. We conduct extensive experiments and ﬁnd the clipping algorithms indeed consistently outperform their unclipped counterpart. We present experimental results on three deep learning benchmarks: CIFAR-10 classiﬁcation using Res Net-32, Imagenet classiﬁcation using Res Net-50 and language modeling on Penn Treebank (PTB) dataset using AWD-LSTM.
Researcher Affiliation	Academia	Bohang Zhang Key Laboratory of Machine Perception, MOE, School of EECS, Peking University zhangbohang@pku.edu.cn Jikai Jin School of Mathematical Sciences Peking University jkjin@pku.edu.cn Cong Fang University of Pennsylvania fangcong@pku.edu.cn Liwei Wang Key Laboratory of Machine Perception, MOE, School of EECS, Peking University Center of Data Science, Peking University wanglw@cis.pku.edu.cn
Pseudocode	Yes	Algorithm 1: The General Clipping Framework
Open Source Code	Yes	Our code is available at https://github.com/zbh2047/clipping-algorithms.
Open Datasets	Yes	CIFAR-10 classiﬁcation using Res Net-32, Imagenet classiﬁcation using Res Net-50 and language modeling on Penn Treebank (PTB) dataset using AWD-LSTM.
Dataset Splits	Yes	CIFAR-10 classiﬁcation using Res Net-32, Imagenet classiﬁcation using Res Net-50 and language modeling on Penn Treebank (PTB) dataset using AWD-LSTM.
Hardware Specification	No	The paper states 'We use batch size 256 on 4 GPUs.' but does not specify the model of the GPUs or any other hardware components.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We set learning rate η = 1.0, momentum β = 0.9 and minibatch size 128, following the common practice. For all the clipping algorithms, we choose the best η and γ based on a course grid search, while keeping other hyper-parameters and training strategy the same as SGD+momentum. We simply set the hyperparameters ν = 0.7 and β = 0.999 in mixed clipping, as suggested in Ma and Yarats [2018] (for its unclipped counterpart QHM).