reproducibilityindex.ai

Training Neural Networks with Fixed Sparse Masks

Authors: Yi-Lin Sung, Varun Nair, Colin A. Raffel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the efﬁcacy of the FISH Mask in three settings: parameter-efﬁcient transfer learning, distributed training, and training with efﬁcient checkpointing. For parameter-efﬁcient transfer learning, we demonstrate that our approach matches the performance of standard gradient-based training on the GLUE benchmark [48] while updating only 0.5% of the model s parameters per task. For distributed training, we evaluate FISH Mask training for both transfer learning on GLUE and training from scratch on CIFAR-10 [25].
Researcher Affiliation	Academia	Yi-Lin Sung UNC Chapel Hill ylsung@cs.unc.edu Varun Nair Duke University vn40@duke.edu Colin Raffel UNC Chapel Hill craffel@gmail.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We release our code publicly to promote further applications of our approach. Code for our work can be found at https://github.com/varunnair18/FISH.
Open Datasets	Yes	We focus on ﬁne-tuning BERTLARGE on the GLUE benchmark [48]... instead focused on from-scratch training of a Res Net-34 on CIFAR-10 [25].
Dataset Splits	Yes	Test set results are reported by submitting to the GLUE benchmark using the ﬁnal model checkpoint following a hyper-parameter search on validation results, unless otherwise noted.
Hardware Specification	Yes	most experiments are run on a RTX 3090 GPU.
Software Dependencies	No	The paper mentions software like the "Hugging Face library" but does not specify version numbers for any key software components.
Experiment Setup	Yes	For all experiments, we ﬁne-tune for 7 epochs and perform a hyper-parameter search across learning rate {1 10 4, 5 10 5, 1 10 5} and batch size {8, 16} for each GLUE task.