Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Training Neural Networks with Fixed Sparse Masks
Authors: Yi-Lin Sung, Varun Nair, Colin A. Raffel
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the efficacy of the FISH Mask in three settings: parameter-efficient transfer learning, distributed training, and training with efficient checkpointing. For parameter-efficient transfer learning, we demonstrate that our approach matches the performance of standard gradient-based training on the GLUE benchmark [48] while updating only 0.5% of the model s parameters per task. For distributed training, we evaluate FISH Mask training for both transfer learning on GLUE and training from scratch on CIFAR-10 [25]. |
| Researcher Affiliation | Academia | Yi-Lin Sung UNC Chapel Hill EMAIL Varun Nair Duke University EMAIL Colin Raffel UNC Chapel Hill EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code publicly to promote further applications of our approach. Code for our work can be found at https://github.com/varunnair18/FISH. |
| Open Datasets | Yes | We focus on fine-tuning BERTLARGE on the GLUE benchmark [48]... instead focused on from-scratch training of a Res Net-34 on CIFAR-10 [25]. |
| Dataset Splits | Yes | Test set results are reported by submitting to the GLUE benchmark using the final model checkpoint following a hyper-parameter search on validation results, unless otherwise noted. |
| Hardware Specification | Yes | most experiments are run on a RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions software like the "Hugging Face library" but does not specify version numbers for any key software components. |
| Experiment Setup | Yes | For all experiments, we fine-tune for 7 epochs and perform a hyper-parameter search across learning rate {1 10 4, 5 10 5, 1 10 5} and batch size {8, 16} for each GLUE task. |