Learning Structured Sparsity in Deep Neural Networks
Authors: Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We evaluate the effectiveness of our SSL using published models on three databases MNIST, CIFAR-10, and Image Net. Without explicit explanation, SSL starts with the network whose weights are initialized by the baseline, and speedups are measured in matrix-matrix multiplication by Caffe in a single-thread Intel Xeon E5-2630 CPU. |
| Researcher Affiliation | Academia | Wei Wen University of Pittsburgh wew57@pitt.edu Chunpeng Wu University of Pittsburgh chw127@pitt.edu Yandan Wang University of Pittsburgh yaw46@pitt.edu Yiran Chen University of Pittsburgh yic52@pitt.edu Hai Li University of Pittsburgh hal66@pitt.edu |
| Pseudocode | No | The paper describes its method through mathematical formulations and textual explanations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code can be found at https://github.com/wenwei202/caffe/tree/scnn. |
| Open Datasets | Yes | We evaluate the effectiveness of our SSL using published models on three databases MNIST, CIFAR-10, and Image Net. |
| Dataset Splits | Yes | A 227 227 image is randomly cropped from each scaled image and mirrored for data augmentation and only the center crop is used for validation. |
| Hardware Specification | Yes | speedups are measured in matrix-matrix multiplication by Caffe in a single-thread Intel Xeon E5-2630 CPU. and on CPU (Intel Xeon) and GPU (Ge Force GTX TITAN Black). Figure 7(c) shows speedups of ℓ1-norm and SSL on various platforms, including both GPU (Quadro, Tesla and Titan) and CPU (Intel Xeon E5-2630). |
| Software Dependencies | No | The paper mentions software like 'Caffe', 'Intel Math Kernel Library', and 'CUDA cu BLAS and cu SPARSE', but it does not specify any version numbers for these software dependencies. |
| Experiment Setup | Yes | Hyper-parameters are selected by cross-validation. we added a dropout layer with a ratio of 0.5 in the fully-connected layer to avoid over-fitting. (for Conv Net). For ResNet, it states the net is finally fine-tuned with a base learning rate of 0.01, which is lower than that (i.e., 0.1) in the baseline. |