Decomposable-Net: Scalable Low-Rank Compression for Neural Networks

Authors: Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments on the Image Net classification task, Decomposable-Net yields superior accuracy in a wide range of model sizes. We evaluate our methods on image-classification tasks of CIFAR-10/100 [Krizhevsky, 2009] and Image Net [Deng et al., 2009] datasets using deep CNNs.
Researcher Affiliation Collaboration Atsushi Yaguchi1 , Taiji Suzuki2,3 , Shuhei Nitta1 , Yukinobu Sakata1 and Akiyuki Tanizawa1 1Toshiba Corporation, Japan 2The University of Tokyo, Japan 3RIKEN Center for Advanced Intelligence Project, Japan
Pseudocode Yes The pseudo-code for learning Decomposable-Net is given in Algorithm 1.
Open Source Code No The paper provides links to third-party code used for comparison (e.g., TRP, VBMF) but does not state that the code for Decomposable-Net itself is open-source or provides a link to it.
Open Datasets Yes We evaluate our methods on image-classification tasks of CIFAR-10/100 [Krizhevsky, 2009] and Image Net [Deng et al., 2009] datasets using deep CNNs.
Dataset Splits Yes All methods are evaluated in terms of the tradeoff between validation (top-1) accuracy and the number of multiply-accumulate operations (MACs). ... we follow the same baseline setup for the CIFAR datasets as used by [Zagoruyko and Komodakis, 2016], and the setup of [Yu et al., 2019] for the Image Net dataset.
Hardware Specification Yes The wall-clock time for training Decomposable-Net with Res Net-50 on the Image Net dataset was 6.9 days with eight NVIDIA V100 GPUs
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used for its experiments.
Experiment Setup Yes We experimentally tuned hyperparameters, and set αu = 0.25, 0.5, and 0.8, respectively for CIFAR-10, CIFAR-100, and Image Net datasets, while fixing αl = 0.01 for all datasets. Except for the results shown in Figure 2(a), we set λ = 0.5 to balance the performance of a fulland low-rank models.