Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Not All Layers of LLMs Are Necessary During Inference
Authors: Siqi Fan, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Shuo Shang, Aixin Sun, Yequan Wang
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on well-known LLMs like the Llama2 series and OPT, show that Ada Infer can achieve an average of 17.8% pruning ratio, and up to 43% on sentiment tasks, with nearly no performance drop (<1%). The paper also includes a dedicated section titled '5 Experiments' detailing experimental settings, main results, and analysis. |
| Researcher Affiliation | Academia | 1University of Electronic Science and Technology of China, Chengdu, China 2Beijing Academy of Artificial Intelligence, Beijing, China 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 4School of Computer Science and Engineering, Nanyang Technological University, Singapore |
| Pseudocode | No | The paper describes the Ada Infer algorithm in Section 4 and illustrates its workflow in Figure 2a, but does not present it in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the Ada Infer methodology. |
| Open Datasets | Yes | Question Answering Tasks. (1) MMLU [Hendrycks et al., 2021]... (2) Commonsense QA [Talmor et al., ]... (3) SQuAD [Rajpurkar et al., 2016]... Text Classification Tasks. (1) SST-2 [Socher et al., 2013]... (2) AG News [Zhang et al., 2015]... |
| Dataset Splits | No | The paper refers to using 'test set' and 'training set examples' for evaluation and in-context learning, and mentions 'sample sizes of 5, 10, 15, and 20' for few-shot scenarios. However, it does not provide specific dataset split percentages, explicit sample counts for train/validation/test sets, or detailed partitioning methodology. |
| Hardware Specification | Yes | Table 3 compares the runtime of Ada Infer with a dense implementation on MMLU and Sentiment tasks (5-shot, batch size set to 1), using 6 V100 (32GB). |
| Software Dependencies | No | The paper states, 'We utilized the sklearn library for training SVM1 and CRF2, adhering to their default configurations,' but does not provide specific version numbers for these libraries. |
| Experiment Setup | Yes | In-Context Learning Setting. We evaluate Ada Infer under zero-shot and few-shot scenarios, using sample sizes of 5, 10, 15, and 20. ... For in-context learning prompts, we use a default template: Q : {xk}\n A : {yk}\n\n, concatenating random xk and yk samples from task-specific training sets. ... We utilized the sklearn library for training SVM1 and CRF2, adhering to their default configurations. ... batch size set to 1, using 6 V100 (32GB). |