Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory
Authors: Zeru Zhang, Jiayin Jin, Zijie Zhang, Yang Zhou, Xin Zhao, Jiaxiang Ren, Ji Liu, Lingfei Wu, Ruoming Jin, Dejing Dou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation on real datasets demonstrates the superior performance of our proposed IMC model against several state-of-the-art neural network pruning and LTH methods. More experiments, implementation details, and hyperparameter setting are presented in Appendices A.3-A.5. |
| Researcher Affiliation | Collaboration | 1Auburn University, 2Baidu Research, 3JD.COM Silicon Valley Research Center, 4Kent State University, 5University of Oregon |
| Pseudocode | Yes | The following are the algorithm descriptions of our Inertial Manifold-based neural network Compression (IMC) method step by step: (1) Given a dense neural network f(x; W) with randomlyinitialized flattened weight parameters W = W0 Rd, when optimizing W with stochastic gradient descent (SGD) on a training set, we generate an approximate W in few training iterations (10 iterations in our implementation), where W is a local minimum point of loss function L regarding W in Eq.(6) (i.e., L(W ) = 0). ... and (6) We prune the original network with high-dimensional parameters W Rd to generate a subnetwork with low-dimensional parameters W + Rd k, i.e., reduce W to W +, and train W + until convergence. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of its code. |
| Open Datasets | Yes | In this section, we have evaluated the effectiveness of our IMC model and other baselines for neural network pruning over three standard image classification datasets: CIFAR-10 [28], CIFAR-100 [28], and Image Net [10]. |
| Dataset Splits | No | The paper mentions training and testing on datasets like CIFAR-10, CIFAR-100, and Image Net, and states 'The experiments exactly follow the same settings described by the original LTH paper [13, 14] and other following works on LTH and network pruning', but does not explicitly provide specific percentages or sample counts for train/validation/test splits within the paper. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for the experiments (e.g., GPU models, CPU types, or cloud resources with specifications). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | More experiments, implementation details, and hyperparameter setting are presented in Appendices A.3-A.5. ... (1) Given a dense neural network f(x; W) with randomlyinitialized flattened weight parameters W = W0 Rd, when optimizing W with stochastic gradient descent (SGD) on a training set, we generate an approximate W in few training iterations (10 iterations in our implementation), where W is a local minimum point of loss function L regarding W in Eq.(6) (i.e., L(W ) = 0). ... In our implementation, we choose p = 2 and = 0.01. ... The experiments exactly follow the same settings described by the original LTH paper [13, 14] and other following works on LTH and network pruning [70, 59, 15, 53, 69, 11]. |