Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory

Authors: Zeru Zhang, Jiayin Jin, Zijie Zhang, Yang Zhou, Xin Zhao, Jiaxiang Ren, Ji Liu, Lingfei Wu, Ruoming Jin, Dejing Dou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation on real datasets demonstrates the superior performance of our proposed IMC model against several state-of-the-art neural network pruning and LTH methods. More experiments, implementation details, and hyperparameter setting are presented in Appendices A.3-A.5.
Researcher Affiliation Collaboration 1Auburn University, 2Baidu Research, 3JD.COM Silicon Valley Research Center, 4Kent State University, 5University of Oregon
Pseudocode Yes The following are the algorithm descriptions of our Inertial Manifold-based neural network Compression (IMC) method step by step: (1) Given a dense neural network f(x; W) with randomlyinitialized flattened weight parameters W = W0 Rd, when optimizing W with stochastic gradient descent (SGD) on a training set, we generate an approximate W in few training iterations (10 iterations in our implementation), where W is a local minimum point of loss function L regarding W in Eq.(6) (i.e., L(W ) = 0). ... and (6) We prune the original network with high-dimensional parameters W Rd to generate a subnetwork with low-dimensional parameters W + Rd k, i.e., reduce W to W +, and train W + until convergence.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of its code.
Open Datasets Yes In this section, we have evaluated the effectiveness of our IMC model and other baselines for neural network pruning over three standard image classification datasets: CIFAR-10 [28], CIFAR-100 [28], and Image Net [10].
Dataset Splits No The paper mentions training and testing on datasets like CIFAR-10, CIFAR-100, and Image Net, and states 'The experiments exactly follow the same settings described by the original LTH paper [13, 14] and other following works on LTH and network pruning', but does not explicitly provide specific percentages or sample counts for train/validation/test splits within the paper.
Hardware Specification No The paper does not provide any specific details about the hardware used for the experiments (e.g., GPU models, CPU types, or cloud resources with specifications).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes More experiments, implementation details, and hyperparameter setting are presented in Appendices A.3-A.5. ... (1) Given a dense neural network f(x; W) with randomlyinitialized flattened weight parameters W = W0 Rd, when optimizing W with stochastic gradient descent (SGD) on a training set, we generate an approximate W in few training iterations (10 iterations in our implementation), where W is a local minimum point of loss function L regarding W in Eq.(6) (i.e., L(W ) = 0). ... In our implementation, we choose p = 2 and = 0.01. ... The experiments exactly follow the same settings described by the original LTH paper [13, 14] and other following works on LTH and network pruning [70, 59, 15, 53, 69, 11].