Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

Authors: Jianghao Shen, Yue Wang, Pengfei Xu, Yonggan Fu, Zhangyang Wang, Yingyan Lin5700-5708

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS.
Researcher Affiliation Academia Jianghao Shen,1,2 Yue Wang,1 Pengfei Xu,1 Yonggan Fu,1 Zhangyang Wang,2 Yingyan Lin1 1Rice University, 2Texas A & M University {nie, atlaswang}@tamu.edu, {yw68, px5, yf22, yingyan.lin}@rice.edu
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our source code and supplementary material are available at https://github.com/Torment123/DFS.
Open Datasets Yes Models and Datasets: We evaluate the DFS method using Res Net38 and Res Net74 as the backbone models on two datasets: CIFAR-10 and CIFAR-100.
Dataset Splits No The paper uses CIFAR-10 and CIFAR-100 datasets but does not explicitly state the training, validation, and test dataset splits with percentages, sample counts, or specific split methodologies.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Training Details: The training of DFS follows the twostep procedure as described in Section 3. For the first step, we set the initial learning rate as 0.1, and train the gating network with a total of 64000 iterations; the learning rate is reduced by 10 after the 32000-th iteration, and further reduced by 10 after the 48000-th iteration. The specified computation budget is set to 100%. The hyperparameters including the momentum, weight decaying factor, and batch size are set to be 0.9, 1e-4, and 128, respectively, and the absolute value of α in Equation (2) is set to 5e-6.