Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Authors: Yigitcan Kaya, Sanghyun Hong, Tudor Dumitras

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply SDN to four modern architectures, trained on three image classification tasks, to characterize the overthinking problem. We show that SDNs can mitigate the wasteful effect of overthinking with confidence-based early exits, which reduce the average inference cost by more than 50% and preserve the accuracy.
Researcher Affiliation Academia 1University of Maryland, Maryland, USA.
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes We also release all of our source code3. 3www.shallowdeep.network
Open Datasets Yes In our experiments, we use three datasets for benchmarking: CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009) and Tiny Image Net (Deng et al., 2009)
Dataset Splits Yes CIFAR-10 and CIFAR-100 images are drawn from 10 and 100 classes, respectively; containing 50,000 training and 10,000 validation images. The Tiny Image Net dataset consists of a subset of Image Net images (Deng et al., 2009), resized at 64x64 pixels. There are 200 classes, each of which has 500 training and 50 validation images.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or cloud instance types used for running experiments.
Software Dependencies No The paper mentions using the Adam optimizer and refers to various network architectures, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes We train the CNNs for 100 epochs, using the hyper-parameters the original studies describe. To apply SDNs to pretrained networks, we train the internal classifiers for 25 epochs, using the Adam optimizer (Kingma & Ba, 2014). If we start training a modified network from scratch, we train for 100 epochs; the same as the original networks.