Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Spartan: Differentiable Sparsity via Regularized Transportation
Authors: Kai Sheng Tai, Taipeng Tian, Ser Nam Lim
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Spartan on Res Net-50 [21] and Vi T [15] models trained on the Image Net-1K dataset. On Res Net-50, we find that sparse models trained with Spartan achieve higher generalization accuracies than those trained with existing methods at sparsity levels of 90% and above. |
| Researcher Affiliation | Industry | Correspondence to: Kai Sheng Tai (EMAIL) ... This work was funded by Meta. |
| Pseudocode | Yes | Algorithm 1 Iterative magnitude pruning update... Algorithm 2 Dual averaging / Top-KAST update... Algorithm 3 Spartan parameter update... Algorithm 4 Soft top-k forward pass... Algorithm 5 Soft top-k backward pass |
| Open Source Code | Yes | We provide an open source implementation of Spartan at https://github.com/facebookresearch/spartan. |
| Open Datasets | Yes | We train and evaluate our models on the Image Net-1K dataset with the standard training-validation split and report means and standard deviations over 3 independent trials. |
| Dataset Splits | Yes | We train and evaluate our models on the Image Net-1K dataset with the standard training-validation split and report means and standard deviations over 3 independent trials. ... Table 1: Top-1 accuracies on Image Net-1K validation set with fully dense training. |
| Hardware Specification | Yes | We use mixed precision training with a batch size of 4096 on 8 NVIDIA A100 GPUs. ... We use mixed precision training with a batch size of 4096 on 16 NVIDIA A100 GPUs across 2 nodes. |
| Software Dependencies | No | The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python version, library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | In all our experiments, we run Spartan with the training schedule described in Section 3.2. ... For all Spartan runs, we use βmax = 10, which we selected based on models trained at 95% accuracy. ... We use mixed precision training with a batch size of 4096... We augment the training data using Rand Augment [10], Mix Up [45] and Cut Mix [44]. Our Vi T models are trained from random initialization, without any pretraining. We set βmax = 20 for Spartan with unstructured sparsity, and βmax = 320 and βmax = 640 for Spartan with 16 16 and 32 32 blocks respectively. |