Empowering Adaptive Early-Exit Inference with Latency Awareness
Authors: Xinrui Tan, Hongjia Li, Liming Wang, Xueqing Huang, Zhen Xu9825-9833
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR100, Image Net and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time. Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR100, Image Net and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time. |
| Researcher Affiliation | Academia | Xinrui Tan,1 Hongjia Li,1 Liming Wang,1 Xueqing Huang,2 Zhen Xu1 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2New York Institute of Technology, New York, USA {tanxinrui, lihongjia, wangliming, xuzhen}@iie.ac.cn, xhuang25@nyit.edu |
| Pseudocode | Yes | Algorithm 1 i PPP method for threshold determination |
| Open Source Code | Yes | Data and code to reproduce all results are available at https://github.com/Xinrui Tan/AAAI21. |
| Open Datasets | Yes | We evaluate the effectiveness and efficiency of our threshold determination method using three representative object recognition early-exit models, i.e., a B-Alex Net (Teerapittayanon, Mc Danel, and Kung 2016) on CIFAR-10, an SRes Net-18 (Zhang et al. 2019) on CIFAR-100, and a MSDNet (Huang et al. 2017) on Image Net. ... two Long Short-Term Memory (LSTM) models, respectively on two time-series datasets, namely Google-30 (Warden 2018) and DSA-19 (Altun, Barshan, and Tunc el 2010) |
| Dataset Splits | Yes | For each trained model, we use the validation set to determine the thresholds under different average latency requirements, and evaluate our method against the baseline methods not only on the validation set to examine whether our method can well tackle our target optimization problem, but also on the test set to examine whether the threshold settings produced by our method can generalize well beyond the data used for threshold determination. |
| Hardware Specification | Yes | We implement our method in Python, and measure its execution time on an Intel quad-core 2.9 GHz CPU. |
| Software Dependencies | No | The paper states, "We implement our method in Python", but it does not specify the version number of Python or any other software libraries or dependencies used (e.g., PyTorch, TensorFlow, specific Python packages with versions). |
| Experiment Setup | Yes | In our experiments on the three object recognition models, we set k = 30, T = 100, γ = 10, J = 10, µ = 1, η = 100 and use all-ones vector as the search direction d for our warm-start strategy. Since the latency metrics of the three models are different, we choose β = 2 107 for B-Alex Net, β = 500 for S-Res Net-18 and β = 10 14 for MSDNet. For the experiments on the two LSTM models, we increase k to 100 and keep all other algorithm parameters the same as in the experiments on S-Res Net-18. |