Neural Epitome Search for Architecture-Agnostic Network Compression

Authors: Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that, on Image Net, when taking Mobile Net V2 as backbone, our approach improves the full-model baseline by 1.47% in top-1 accuracy with 25% MAdd reduction, and with the same compression ratio, improves Auto ML for Model Compression (AMC) by 2.5% in top-1 accuracy.
Researcher Affiliation Collaboration Daquan Zhou1, Xiaojie Jin2 , Qibin Hou1, Kaixin Wang1, Jianchao Yang2, Jiashi Feng1 1Department of Electrical and Computer Engineering, National University of Singapore 2Bytedance Inc., Mountain View, USA
Pseudocode No The paper provides mathematical formulations and descriptive text for its methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code can be found at https://github.com/zhoudaquan/NES.
Open Datasets Yes We conduct extensive experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) and Image Net (Deng et al., 2009).
Dataset Splits No The paper mentions using CIFAR-10 and ImageNet for experiments but does not explicitly provide details on how the datasets were split into training, validation, and test sets (e.g., percentages, absolute counts, or references to predefined standard splits for reproducibility).
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for conducting the experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as programming language versions or library versions (e.g., PyTorch 1.9, TensorFlow 2.x), needed to reproduce the experiments.
Experiment Setup Yes Detailed experiments settings can be found in Appendix A. For all experiments, we do not use additional training tricks including the squeeze-and-excitation module (Hu et al., 2018) and the Swish activation function (Ramachandran et al., 2017) which can further improve the results unless those are used in the bachbone model originally. ... The routing map is built as a look-up table during the training phase by recording the moving average of the output index from the index learner η. For example, the starting index pair as shown in Figure 2 can be fetched via (pt, qt) = M(i, j, m) where (i, j, m) is the spatial location in the weight tensor and (pt, qt) is the starting index of the selected sub-tensor in the epitome at training epoch t. The routing map M is constructed via Eqn. (6) as shown below with momentum µ during the training phase. µ is treated as a hyper-parameter and is decided empirically1: M(i, j, m) = (pt 1, qt 1) + µ η(x). (6) 1We set µ to be 0.97 in our experiments.