Learning Structured Sparsity in Deep Neural Networks

Authors: Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We evaluate the effectiveness of our SSL using published models on three databases MNIST, CIFAR-10, and Image Net. Without explicit explanation, SSL starts with the network whose weights are initialized by the baseline, and speedups are measured in matrix-matrix multiplication by Caffe in a single-thread Intel Xeon E5-2630 CPU.
Researcher Affiliation Academia Wei Wen University of Pittsburgh wew57@pitt.edu Chunpeng Wu University of Pittsburgh chw127@pitt.edu Yandan Wang University of Pittsburgh yaw46@pitt.edu Yiran Chen University of Pittsburgh yic52@pitt.edu Hai Li University of Pittsburgh hal66@pitt.edu
Pseudocode No The paper describes its method through mathematical formulations and textual explanations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our source code can be found at https://github.com/wenwei202/caffe/tree/scnn.
Open Datasets Yes We evaluate the effectiveness of our SSL using published models on three databases MNIST, CIFAR-10, and Image Net.
Dataset Splits Yes A 227 227 image is randomly cropped from each scaled image and mirrored for data augmentation and only the center crop is used for validation.
Hardware Specification Yes speedups are measured in matrix-matrix multiplication by Caffe in a single-thread Intel Xeon E5-2630 CPU. and on CPU (Intel Xeon) and GPU (Ge Force GTX TITAN Black). Figure 7(c) shows speedups of ℓ1-norm and SSL on various platforms, including both GPU (Quadro, Tesla and Titan) and CPU (Intel Xeon E5-2630).
Software Dependencies No The paper mentions software like 'Caffe', 'Intel Math Kernel Library', and 'CUDA cu BLAS and cu SPARSE', but it does not specify any version numbers for these software dependencies.
Experiment Setup Yes Hyper-parameters are selected by cross-validation. we added a dropout layer with a ratio of 0.5 in the fully-connected layer to avoid over-fitting. (for Conv Net). For ResNet, it states the net is finally fine-tuned with a base learning rate of 0.01, which is lower than that (i.e., 0.1) in the baseline.