reproducibilityindex.ai

Supermasks in Superposition

Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments, 4.1 Scenario GG: Task Identity Information Given During Train and Inference, 4.2 Scenarios GNs & GNu: Task Identity Information Given During Train Only, 4.3 Scenario NNs: No Task Identity During Training or Inference.
Researcher Affiliation	Collaboration	Mitchell Wortsman University of Washington Vivek Ramanujan Allen Institute for AI Rosanne Liu ML Collective Aniruddha Kembhavi Allen Institute for AI Mohammad Rastegari University of Washington Jason Yosinski ML Collective Ali Farhadi University of Washington
Pseudocode	Yes	Pseudo-code for both algorithms may be found in Section A of the appendix.
Open Source Code	Yes	Code available at https://github. com/RAIVNLab/supsup
Open Datasets	Yes	Datasets, Models & Training In this experiment we validate the performance of Sup Sup on Split CIFAR100 and Split Image Net. Following Wen et al. [51], Split CIFAR100 randomly partitions CIFAR100 [24] into 20 different 5-way classiﬁcation problems. Similarly, Split Image Net randomly splits the Image Net [5] dataset into 100 different 10-way classiﬁcation tasks. For Permuted MNIST [23], new tasks are created with a ﬁxed random permutation of the pixels of MNIST.
Dataset Splits	Yes	Following Wen et al. [51], Split CIFAR100 randomly partitions CIFAR100 [24] into 20 different 5-way classiﬁcation problems. Similarly, Split Image Net randomly splits the Image Net [5] dataset into 100 different 10-way classiﬁcation tasks.
Hardware Specification	Yes	In particular, on a 1080 Ti this operation requires 1% of the forward pass time for a Res Net-50
Software Dependencies	No	The sparse supermasks are stored in the standard scipy.sparse.csc3 format with 16 bit integers. Moreover, Sup Sup requires minimal overhead in terms of forwards pass compute. Elementwise product by a binary mask can be implemented via memory access, i.e. selecting indices. Modern GPUs have very high memory bandwidth so the time cost of this operation is small with respect to the time of a forward pass. and Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024 8035, 2019.
Experiment Setup	Yes	For each task we train for 1000 batches of size 128 using the RMSProp optimizer [48] with learning rate 0.0001 which follows the hyperparameters of [4].