Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Authors: Elia Cunegatti, Leonardo Lucio Custode, Giovanni Iacca
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method over 300 test cases with four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off. |
| Researcher Affiliation | Collaboration | Elia Cunegatti EMAIL University of Trento, Italy Leonardo Lucio Custode EMAIL Independent Researcher Giovanni Iacca EMAIL University of Trento, Italy |
| Pseudocode | Yes | Figure 3: Left: Overall Neuron Al top-up pruning procedure. Right: Get Best Neuron AL sub-routine used in both blockand row-selection stages. |
| Open Source Code | Yes | The code is available at https://github.com/eliacunegatti/Neuro AL. |
| Open Datasets | Yes | Language Modeling Datasets To measure the models perplexity on Language Modeling datasets, we use the following three datasets: (1) Wiki Text2 (Merity et al., 2017), (2) Colossal Clean Common Crawl (C4) (Raffel et al., 2020), and (3) Penn Treebank (PTB). Zero-Shot Tasks To assess more thoroughly how the different pruning algorithms affect the models capabilities, we employ the following 7 datasets: (1) Recognizing Textual Entailment (RTE) (Dagan et al., 2006; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009) , (2) Wino Grande (Sakaguchi et al., 2021), (3) Bool Q (Clark et al., 2019), (4) Hella Swag (Zellers et al., 2019), (5) ARC-e (Clark et al., 2018), (6) ARC-c (Clark et al., 2018), (7) OBQA (Mihaylov et al., 2018) |
| Dataset Splits | Yes | For all the pruning algorithms that use calibration data (i.e., multiflow, Wanda, and Sparse GPT), we use 128 samples from the C4 dataset, as in (Frantar & Alistarh, 2023; Sun et al., 2023; Yin et al., 2024). ... For both C and Cλ, we use the same seed (0) for the calibration set, i.e., Cλ contains the first 8 elements of C. |
| Hardware Specification | Yes | All the experiments have been run on NVIDIA A100 GPUs, both with 40 and 80 GB. ... The evaluation consists of the end-to-end token generation and has been done over an Intel i910980XE CPU using 18 cores. |
| Software Dependencies | No | The paper mentions 'inference pipeline based on Deep Sparse (Neural Magic, 2021) ONNXRuntime backends' but does not specify version numbers for these or other software libraries. |
| Experiment Setup | Yes | For OWL, we set the hyperparameters to the values that are used mostly in the original paper, hence M = 5 and λ = 0.08; we do the same for Alpha Pruning, setting ϵ = 0.3. ... In the experiments, we set λset = [0.01, 0.02, 0.03, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.12, 0.15, 0.20,0.25] for the block step, while for the row step, we also added 0.0 (in case of no performance improvement). ... For both C and Cλ, we use the same seed (0) for the calibration set |