Supermasks in Superposition

Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments, 4.1 Scenario GG: Task Identity Information Given During Train and Inference, 4.2 Scenarios GNs & GNu: Task Identity Information Given During Train Only, 4.3 Scenario NNs: No Task Identity During Training or Inference.
Researcher Affiliation Collaboration Mitchell Wortsman University of Washington Vivek Ramanujan Allen Institute for AI Rosanne Liu ML Collective Aniruddha Kembhavi Allen Institute for AI Mohammad Rastegari University of Washington Jason Yosinski ML Collective Ali Farhadi University of Washington
Pseudocode Yes Pseudo-code for both algorithms may be found in Section A of the appendix.
Open Source Code Yes Code available at https://github. com/RAIVNLab/supsup
Open Datasets Yes Datasets, Models & Training In this experiment we validate the performance of Sup Sup on Split CIFAR100 and Split Image Net. Following Wen et al. [51], Split CIFAR100 randomly partitions CIFAR100 [24] into 20 different 5-way classification problems. Similarly, Split Image Net randomly splits the Image Net [5] dataset into 100 different 10-way classification tasks. For Permuted MNIST [23], new tasks are created with a fixed random permutation of the pixels of MNIST.
Dataset Splits Yes Following Wen et al. [51], Split CIFAR100 randomly partitions CIFAR100 [24] into 20 different 5-way classification problems. Similarly, Split Image Net randomly splits the Image Net [5] dataset into 100 different 10-way classification tasks.
Hardware Specification Yes In particular, on a 1080 Ti this operation requires 1% of the forward pass time for a Res Net-50
Software Dependencies No The sparse supermasks are stored in the standard scipy.sparse.csc3 format with 16 bit integers. Moreover, Sup Sup requires minimal overhead in terms of forwards pass compute. Elementwise product by a binary mask can be implemented via memory access, i.e. selecting indices. Modern GPUs have very high memory bandwidth so the time cost of this operation is small with respect to the time of a forward pass. and Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024 8035, 2019.
Experiment Setup Yes For each task we train for 1000 batches of size 128 using the RMSProp optimizer [48] with learning rate 0.0001 which follows the hyperparameters of [4].