Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
Authors: Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of our proposed method over several vision and language models. First we show the effect of the proposed method for the fine-grained N:M structured sparsity on the neural gradients. Then we combine this method with the fine-grained N:M transposable-weights method (Hubara et al., 2021), allowing the acceleration with N:M structured sparsity in all training GEMM. Moreover, we show the combination of N:M structured sparsity in all training GEMM with 8-bit quantization achieving non or small accuracy degradation. Experimental details appear in Appendix A.4. |
| Researcher Affiliation | Collaboration | Habana Labs An Intel company, Caesarea, Israel, Department of Electrical Engineering Technion, Haifa, Israel |
| Pseudocode | No | The paper mentions providing a method in Appendix A.2 ("We provide such a method in Appendix A.2.") and a reference implementation in supplementary material, but the pseudocode or algorithm blocks themselves are not directly present in the provided text. |
| Open Source Code | Yes | A reference implementation is supplied in the supplementary material. |
| Open Datasets | Yes | Table 4: Effect of applying the proposed MVUE 1:2 and approx-MVUE 2:4 on the neural gradients for different models and datasets. ... Res Net18 Image Net ... Res Net50 Image Net ... Vi T-B16 Cifar10 ... Bert finetune Squad ... Bert pretrain Wiki ... Transformer WMT En-De |
| Dataset Splits | Yes | The paper uses well-known benchmark datasets such as ImageNet, CIFAR-10, SQuAD, Wiki, and WMT En-De, which typically have standard train/validation/test splits. Additionally, it states "Experimental details appear in Appendix A.4.", which would typically specify any non-standard splits or confirm standard ones. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions general hardware discussions in references (Nvidia A100, H100, Graphcore IPU, Habana Gaudi) but not as the experimental setup. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library names like PyTorch, TensorFlow, along with their versions) needed to replicate the experiment. |
| Experiment Setup | Yes | Experimental details appear in Appendix A.4. |