Multicoated Supermasks Enhance Hidden Networks
Authors: Yasuyuki Okoshi, Ángel López Garcı́a-Arias, Kazutoshi Hirose, Kota Ando, Kazushi Kawamura, Thiem Van Chu, Masato Motomura, Jaehoon Yu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on CIFAR-10 and Image Net show that Multicoated Supermasks enhance the tradeoff between accuracy and model size. |
| Researcher Affiliation | Academia | 1Tokyo Institute of Technology, Japan. |
| Pseudocode | No | The paper contains mathematical formulations and descriptions of the proposed method but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at: https://github.com/ yasu0001/multicoated-supermasks |
| Open Datasets | Yes | We evaluate Multicoated Supermasks for image classification using the CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets. |
| Dataset Splits | Yes | We evaluate Multicoated Supermasks for image classification using the CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets. In CIFAR-10 experiments, the learning rate is decreased by 0.1 after 50 and 75 epochs starting from 0.1 with a batch size of 128; in Image Net experiments, the learning rate is reduced using cosine annealing starting from 0.1, with a batch size of 256. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | All models and experiments are implemented using MMClassification (MMClassification Contributors, 2020), a toolbox based on Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | In both cases residual networks (He et al., 2016) are trained for 100 epochs using stochastic gradient descent (SGD) with weight decay of 0.0001 and momentum of 0.9. In CIFAR-10 experiments, the learning rate is decreased by 0.1 after 50 and 75 epochs starting from 0.1 with a batch size of 128; in Image Net experiments, the learning rate is reduced using cosine annealing starting from 0.1, with a batch size of 256. |