Minimum Variance Unbiased N:M Sparsity for the Neural Gradients

Authors: Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the effectiveness of our proposed method over several vision and language models. First we show the effect of the proposed method for the fine-grained N:M structured sparsity on the neural gradients. Then we combine this method with the fine-grained N:M transposable-weights method (Hubara et al., 2021), allowing the acceleration with N:M structured sparsity in all training GEMM. Moreover, we show the combination of N:M structured sparsity in all training GEMM with 8-bit quantization achieving non or small accuracy degradation. Experimental details appear in Appendix A.4.
Researcher Affiliation Collaboration Habana Labs An Intel company, Caesarea, Israel, Department of Electrical Engineering Technion, Haifa, Israel
Pseudocode No The paper mentions providing a method in Appendix A.2 ("We provide such a method in Appendix A.2.") and a reference implementation in supplementary material, but the pseudocode or algorithm blocks themselves are not directly present in the provided text.
Open Source Code Yes A reference implementation is supplied in the supplementary material.
Open Datasets Yes Table 4: Effect of applying the proposed MVUE 1:2 and approx-MVUE 2:4 on the neural gradients for different models and datasets. ... Res Net18 Image Net ... Res Net50 Image Net ... Vi T-B16 Cifar10 ... Bert finetune Squad ... Bert pretrain Wiki ... Transformer WMT En-De
Dataset Splits Yes The paper uses well-known benchmark datasets such as ImageNet, CIFAR-10, SQuAD, Wiki, and WMT En-De, which typically have standard train/validation/test splits. Additionally, it states "Experimental details appear in Appendix A.4.", which would typically specify any non-standard splits or confirm standard ones.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions general hardware discussions in references (Nvidia A100, H100, Graphcore IPU, Habana Gaudi) but not as the experimental setup.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library names like PyTorch, TensorFlow, along with their versions) needed to replicate the experiment.
Experiment Setup Yes Experimental details appear in Appendix A.4.