Training Neural Networks with Fixed Sparse Masks
Authors: Yi-Lin Sung, Varun Nair, Colin A. Raffel
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the efficacy of the FISH Mask in three settings: parameter-efficient transfer learning, distributed training, and training with efficient checkpointing. For parameter-efficient transfer learning, we demonstrate that our approach matches the performance of standard gradient-based training on the GLUE benchmark [48] while updating only 0.5% of the model s parameters per task. For distributed training, we evaluate FISH Mask training for both transfer learning on GLUE and training from scratch on CIFAR-10 [25]. |
| Researcher Affiliation | Academia | Yi-Lin Sung UNC Chapel Hill ylsung@cs.unc.edu Varun Nair Duke University vn40@duke.edu Colin Raffel UNC Chapel Hill craffel@gmail.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code publicly to promote further applications of our approach. Code for our work can be found at https://github.com/varunnair18/FISH. |
| Open Datasets | Yes | We focus on fine-tuning BERTLARGE on the GLUE benchmark [48]... instead focused on from-scratch training of a Res Net-34 on CIFAR-10 [25]. |
| Dataset Splits | Yes | Test set results are reported by submitting to the GLUE benchmark using the final model checkpoint following a hyper-parameter search on validation results, unless otherwise noted. |
| Hardware Specification | Yes | most experiments are run on a RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions software like the "Hugging Face library" but does not specify version numbers for any key software components. |
| Experiment Setup | Yes | For all experiments, we fine-tune for 7 epochs and perform a hyper-parameter search across learning rate {1 10 4, 5 10 5, 1 10 5} and batch size {8, 16} for each GLUE task. |