Supermasks in Superposition
Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments, 4.1 Scenario GG: Task Identity Information Given During Train and Inference, 4.2 Scenarios GNs & GNu: Task Identity Information Given During Train Only, 4.3 Scenario NNs: No Task Identity During Training or Inference. |
| Researcher Affiliation | Collaboration | Mitchell Wortsman University of Washington Vivek Ramanujan Allen Institute for AI Rosanne Liu ML Collective Aniruddha Kembhavi Allen Institute for AI Mohammad Rastegari University of Washington Jason Yosinski ML Collective Ali Farhadi University of Washington |
| Pseudocode | Yes | Pseudo-code for both algorithms may be found in Section A of the appendix. |
| Open Source Code | Yes | Code available at https://github. com/RAIVNLab/supsup |
| Open Datasets | Yes | Datasets, Models & Training In this experiment we validate the performance of Sup Sup on Split CIFAR100 and Split Image Net. Following Wen et al. [51], Split CIFAR100 randomly partitions CIFAR100 [24] into 20 different 5-way classification problems. Similarly, Split Image Net randomly splits the Image Net [5] dataset into 100 different 10-way classification tasks. For Permuted MNIST [23], new tasks are created with a fixed random permutation of the pixels of MNIST. |
| Dataset Splits | Yes | Following Wen et al. [51], Split CIFAR100 randomly partitions CIFAR100 [24] into 20 different 5-way classification problems. Similarly, Split Image Net randomly splits the Image Net [5] dataset into 100 different 10-way classification tasks. |
| Hardware Specification | Yes | In particular, on a 1080 Ti this operation requires 1% of the forward pass time for a Res Net-50 |
| Software Dependencies | No | The sparse supermasks are stored in the standard scipy.sparse.csc3 format with 16 bit integers. Moreover, Sup Sup requires minimal overhead in terms of forwards pass compute. Elementwise product by a binary mask can be implemented via memory access, i.e. selecting indices. Modern GPUs have very high memory bandwidth so the time cost of this operation is small with respect to the time of a forward pass. and Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024 8035, 2019. |
| Experiment Setup | Yes | For each task we train for 1000 batches of size 128 using the RMSProp optimizer [48] with learning rate 0.0001 which follows the hyperparameters of [4]. |