Multicoated Supermasks Enhance Hidden Networks

Authors: Yasuyuki Okoshi, Ángel López Garcı́a-Arias, Kazutoshi Hirose, Kota Ando, Kazushi Kawamura, Thiem Van Chu, Masato Motomura, Jaehoon Yu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on CIFAR-10 and Image Net show that Multicoated Supermasks enhance the tradeoff between accuracy and model size.
Researcher Affiliation Academia 1Tokyo Institute of Technology, Japan.
Pseudocode No The paper contains mathematical formulations and descriptions of the proposed method but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Code available at: https://github.com/ yasu0001/multicoated-supermasks
Open Datasets Yes We evaluate Multicoated Supermasks for image classification using the CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets.
Dataset Splits Yes We evaluate Multicoated Supermasks for image classification using the CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets. In CIFAR-10 experiments, the learning rate is decreased by 0.1 after 50 and 75 epochs starting from 0.1 with a batch size of 128; in Image Net experiments, the learning rate is reduced using cosine annealing starting from 0.1, with a batch size of 256.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No All models and experiments are implemented using MMClassification (MMClassification Contributors, 2020), a toolbox based on Py Torch (Paszke et al., 2019).
Experiment Setup Yes In both cases residual networks (He et al., 2016) are trained for 100 epochs using stochastic gradient descent (SGD) with weight decay of 0.0001 and momentum of 0.9. In CIFAR-10 experiments, the learning rate is decreased by 0.1 after 50 and 75 epochs starting from 0.1 with a batch size of 128; in Image Net experiments, the learning rate is reduced using cosine annealing starting from 0.1, with a batch size of 256.