Spartan: Differentiable Sparsity via Regularized Transportation

Authors: Kai Sheng Tai, Taipeng Tian, Ser Nam Lim

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Spartan on Res Net-50 [21] and Vi T [15] models trained on the Image Net-1K dataset. On Res Net-50, we find that sparse models trained with Spartan achieve higher generalization accuracies than those trained with existing methods at sparsity levels of 90% and above.
Researcher Affiliation Industry Correspondence to: Kai Sheng Tai (kst@meta.com) ... This work was funded by Meta.
Pseudocode Yes Algorithm 1 Iterative magnitude pruning update... Algorithm 2 Dual averaging / Top-KAST update... Algorithm 3 Spartan parameter update... Algorithm 4 Soft top-k forward pass... Algorithm 5 Soft top-k backward pass
Open Source Code Yes We provide an open source implementation of Spartan at https://github.com/facebookresearch/spartan.
Open Datasets Yes We train and evaluate our models on the Image Net-1K dataset with the standard training-validation split and report means and standard deviations over 3 independent trials.
Dataset Splits Yes We train and evaluate our models on the Image Net-1K dataset with the standard training-validation split and report means and standard deviations over 3 independent trials. ... Table 1: Top-1 accuracies on Image Net-1K validation set with fully dense training.
Hardware Specification Yes We use mixed precision training with a batch size of 4096 on 8 NVIDIA A100 GPUs. ... We use mixed precision training with a batch size of 4096 on 16 NVIDIA A100 GPUs across 2 nodes.
Software Dependencies No The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup Yes In all our experiments, we run Spartan with the training schedule described in Section 3.2. ... For all Spartan runs, we use βmax = 10, which we selected based on models trained at 95% accuracy. ... We use mixed precision training with a batch size of 4096... We augment the training data using Rand Augment [10], Mix Up [45] and Cut Mix [44]. Our Vi T models are trained from random initialization, without any pretraining. We set βmax = 20 for Spartan with unstructured sparsity, and βmax = 320 and βmax = 640 for Spartan with 16 16 and 32 32 blocks respectively.