reproducibilityindex.ai

Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Authors: Yigitcan Kaya, Sanghyun Hong, Tudor Dumitras

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply SDN to four modern architectures, trained on three image classiﬁcation tasks, to characterize the overthinking problem. We show that SDNs can mitigate the wasteful effect of overthinking with conﬁdence-based early exits, which reduce the average inference cost by more than 50% and preserve the accuracy.
Researcher Affiliation	Academia	1University of Maryland, Maryland, USA.
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	We also release all of our source code3. 3www.shallowdeep.network
Open Datasets	Yes	In our experiments, we use three datasets for benchmarking: CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009) and Tiny Image Net (Deng et al., 2009)
Dataset Splits	Yes	CIFAR-10 and CIFAR-100 images are drawn from 10 and 100 classes, respectively; containing 50,000 training and 10,000 validation images. The Tiny Image Net dataset consists of a subset of Image Net images (Deng et al., 2009), resized at 64x64 pixels. There are 200 classes, each of which has 500 training and 50 validation images.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and refers to various network architectures, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	We train the CNNs for 100 epochs, using the hyper-parameters the original studies describe. To apply SDNs to pretrained networks, we train the internal classiﬁers for 25 epochs, using the Adam optimizer (Kingma & Ba, 2014). If we start training a modiﬁed network from scratch, we train for 100 epochs; the same as the original networks.