Empowering Adaptive Early-Exit Inference with Latency Awareness

Authors: Xinrui Tan, Hongjia Li, Liming Wang, Xueqing Huang, Zhen Xu9825-9833

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR100, Image Net and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time. Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR100, Image Net and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time.
Researcher Affiliation Academia Xinrui Tan,1 Hongjia Li,1 Liming Wang,1 Xueqing Huang,2 Zhen Xu1 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2New York Institute of Technology, New York, USA {tanxinrui, lihongjia, wangliming, xuzhen}@iie.ac.cn, xhuang25@nyit.edu
Pseudocode Yes Algorithm 1 i PPP method for threshold determination
Open Source Code Yes Data and code to reproduce all results are available at https://github.com/Xinrui Tan/AAAI21.
Open Datasets Yes We evaluate the effectiveness and efficiency of our threshold determination method using three representative object recognition early-exit models, i.e., a B-Alex Net (Teerapittayanon, Mc Danel, and Kung 2016) on CIFAR-10, an SRes Net-18 (Zhang et al. 2019) on CIFAR-100, and a MSDNet (Huang et al. 2017) on Image Net. ... two Long Short-Term Memory (LSTM) models, respectively on two time-series datasets, namely Google-30 (Warden 2018) and DSA-19 (Altun, Barshan, and Tunc el 2010)
Dataset Splits Yes For each trained model, we use the validation set to determine the thresholds under different average latency requirements, and evaluate our method against the baseline methods not only on the validation set to examine whether our method can well tackle our target optimization problem, but also on the test set to examine whether the threshold settings produced by our method can generalize well beyond the data used for threshold determination.
Hardware Specification Yes We implement our method in Python, and measure its execution time on an Intel quad-core 2.9 GHz CPU.
Software Dependencies No The paper states, "We implement our method in Python", but it does not specify the version number of Python or any other software libraries or dependencies used (e.g., PyTorch, TensorFlow, specific Python packages with versions).
Experiment Setup Yes In our experiments on the three object recognition models, we set k = 30, T = 100, γ = 10, J = 10, µ = 1, η = 100 and use all-ones vector as the search direction d for our warm-start strategy. Since the latency metrics of the three models are different, we choose β = 2 107 for B-Alex Net, β = 500 for S-Res Net-18 and β = 10 14 for MSDNet. For the experiments on the two LSTM models, we increase k to 100 and keep all other algorithm parameters the same as in the experiments on S-Res Net-18.