Training Neural Networks with Fixed Sparse Masks

Authors: Yi-Lin Sung, Varun Nair, Colin A. Raffel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the efficacy of the FISH Mask in three settings: parameter-efficient transfer learning, distributed training, and training with efficient checkpointing. For parameter-efficient transfer learning, we demonstrate that our approach matches the performance of standard gradient-based training on the GLUE benchmark [48] while updating only 0.5% of the model s parameters per task. For distributed training, we evaluate FISH Mask training for both transfer learning on GLUE and training from scratch on CIFAR-10 [25].
Researcher Affiliation Academia Yi-Lin Sung UNC Chapel Hill ylsung@cs.unc.edu Varun Nair Duke University vn40@duke.edu Colin Raffel UNC Chapel Hill craffel@gmail.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We release our code publicly to promote further applications of our approach. Code for our work can be found at https://github.com/varunnair18/FISH.
Open Datasets Yes We focus on fine-tuning BERTLARGE on the GLUE benchmark [48]... instead focused on from-scratch training of a Res Net-34 on CIFAR-10 [25].
Dataset Splits Yes Test set results are reported by submitting to the GLUE benchmark using the final model checkpoint following a hyper-parameter search on validation results, unless otherwise noted.
Hardware Specification Yes most experiments are run on a RTX 3090 GPU.
Software Dependencies No The paper mentions software like the "Hugging Face library" but does not specify version numbers for any key software components.
Experiment Setup Yes For all experiments, we fine-tune for 7 epochs and perform a hyper-parameter search across learning rate {1 10 4, 5 10 5, 1 10 5} and batch size {8, 16} for each GLUE task.