Spartan: Differentiable Sparsity via Regularized Transportation
Authors: Kai Sheng Tai, Taipeng Tian, Ser Nam Lim
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Spartan on Res Net-50 [21] and Vi T [15] models trained on the Image Net-1K dataset. On Res Net-50, we find that sparse models trained with Spartan achieve higher generalization accuracies than those trained with existing methods at sparsity levels of 90% and above. |
| Researcher Affiliation | Industry | Correspondence to: Kai Sheng Tai (kst@meta.com) ... This work was funded by Meta. |
| Pseudocode | Yes | Algorithm 1 Iterative magnitude pruning update... Algorithm 2 Dual averaging / Top-KAST update... Algorithm 3 Spartan parameter update... Algorithm 4 Soft top-k forward pass... Algorithm 5 Soft top-k backward pass |
| Open Source Code | Yes | We provide an open source implementation of Spartan at https://github.com/facebookresearch/spartan. |
| Open Datasets | Yes | We train and evaluate our models on the Image Net-1K dataset with the standard training-validation split and report means and standard deviations over 3 independent trials. |
| Dataset Splits | Yes | We train and evaluate our models on the Image Net-1K dataset with the standard training-validation split and report means and standard deviations over 3 independent trials. ... Table 1: Top-1 accuracies on Image Net-1K validation set with fully dense training. |
| Hardware Specification | Yes | We use mixed precision training with a batch size of 4096 on 8 NVIDIA A100 GPUs. ... We use mixed precision training with a batch size of 4096 on 16 NVIDIA A100 GPUs across 2 nodes. |
| Software Dependencies | No | The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python version, library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | In all our experiments, we run Spartan with the training schedule described in Section 3.2. ... For all Spartan runs, we use βmax = 10, which we selected based on models trained at 95% accuracy. ... We use mixed precision training with a batch size of 4096... We augment the training data using Rand Augment [10], Mix Up [45] and Cut Mix [44]. Our Vi T models are trained from random initialization, without any pretraining. We set βmax = 20 for Spartan with unstructured sparsity, and βmax = 320 and βmax = 640 for Spartan with 16 16 and 32 32 blocks respectively. |