Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Not All Layers of LLMs Are Necessary During Inference

Authors: Siqi Fan, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Shuo Shang, Aixin Sun, Yequan Wang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on well-known LLMs like the Llama2 series and OPT, show that Ada Infer can achieve an average of 17.8% pruning ratio, and up to 43% on sentiment tasks, with nearly no performance drop (<1%). The paper also includes a dedicated section titled '5 Experiments' detailing experimental settings, main results, and analysis.
Researcher Affiliation Academia 1University of Electronic Science and Technology of China, Chengdu, China 2Beijing Academy of Artificial Intelligence, Beijing, China 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 4School of Computer Science and Engineering, Nanyang Technological University, Singapore
Pseudocode No The paper describes the Ada Infer algorithm in Section 4 and illustrates its workflow in Figure 2a, but does not present it in a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the Ada Infer methodology.
Open Datasets Yes Question Answering Tasks. (1) MMLU [Hendrycks et al., 2021]... (2) Commonsense QA [Talmor et al., ]... (3) SQuAD [Rajpurkar et al., 2016]... Text Classification Tasks. (1) SST-2 [Socher et al., 2013]... (2) AG News [Zhang et al., 2015]...
Dataset Splits No The paper refers to using 'test set' and 'training set examples' for evaluation and in-context learning, and mentions 'sample sizes of 5, 10, 15, and 20' for few-shot scenarios. However, it does not provide specific dataset split percentages, explicit sample counts for train/validation/test sets, or detailed partitioning methodology.
Hardware Specification Yes Table 3 compares the runtime of Ada Infer with a dense implementation on MMLU and Sentiment tasks (5-shot, batch size set to 1), using 6 V100 (32GB).
Software Dependencies No The paper states, 'We utilized the sklearn library for training SVM1 and CRF2, adhering to their default configurations,' but does not provide specific version numbers for these libraries.
Experiment Setup Yes In-Context Learning Setting. We evaluate Ada Infer under zero-shot and few-shot scenarios, using sample sizes of 5, 10, 15, and 20. ... For in-context learning prompts, we use a default template: Q : {xk}\n A : {yk}\n\n, concatenating random xk and yk samples from task-specific training sets. ... We utilized the sklearn library for training SVM1 and CRF2, adhering to their default configurations. ... batch size set to 1, using 6 V100 (32GB).