reproducibilityindex.ai

STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

Authors: Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate STEP and other baselines such as ASP and SR-STE on multiple tasks including CIFAR classiﬁcation, machine translation and LLM ﬁne-tuning (BERT-Base, GPT-2). We show STEP mitigates the accuracy drop of baseline recipes and is robust to aggressive structured sparsity ratios.
Researcher Affiliation	Collaboration	1Department of Computer Science, Cornell University 2Google 3Google Deep Mind.
Pseudocode	Yes	Algorithm 1 Proposed STEP Algorithm
Open Source Code	No	The paper does not contain an explicit statement or a link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	CIFAR10/100 dataset (Krizhevsky et al., 2009), GLUE benchmark (Wang et al., 2018), WMT17 De-En Translation task following (Vaswani et al., 2017), Wikitext-2 and Wikitext-103 (Merity et al., 2016).
Dataset Splits	No	The paper mentions using "GLUE development set" for BERT fine-tuning, which implies a validation set. However, it does not provide specific details on train/validation/test splits (percentages, sample counts, or citations to predefined splits for all datasets) for all experiments, such as CIFAR or Wikitext.
Hardware Specification	Yes	All of the experiments run on a Google Cloud TPUv3-8 virtual machine.
Software Dependencies	No	The paper mentions using "deep learning libraries (Paszke et al., 2019; Heek et al., 2020)" (referring to PyTorch and Flax) but does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	For all the Adam-speciﬁc hyperparameters we adopt the default values: {β1 =0.9, β2 =0.999, ϵ= 1e 8}. For the CIFAR tasks, we adopted batch size 128 and tune the learning rate from {1e 4, 5e 5, 1e 5}; for BERT and GPT-2 ﬁne-tuning we follow (Tang et al., 2021) and tune batch size from {8,16,32} and learning rate from {1e 4,5e 5,1e 5}; for WMT machine translation we follow the exact setup of (Vaswani et al., 2017) and (Kao et al., 2022).