Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MODEL SHAPLEY: Find Your Ideal Parameter Player via One Gradient Backpropagation

Authors: Chu Xu, Xinke Jiang, Rihong Qiu, Jiaran Gao, Junfeng Zhao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct a series of experiments to evaluate the performance of MODEL SHAPLEY against several mainstream neuron importance localization baselines. We perform evaluations in both CV and NLP settings, focusing on three key strategies: deactivation during inference, targeted fine-tuning during training, and model compression. Moreover, we also explore the effects of applying these strategies at different parameter unit granularities, such as neurons, layers, and attention heads. Further experimental details and results are provided in Appendix H.
Researcher Affiliation	Academia	1Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China 2Center on Frontiers of Computing Studies, Peking University, Beijing, China 3National Engineering Research Center For Software Engineering, Peking University, Beijing, China 4School of Computer Science, Peking University, Beijing, China 5Big Data Technology Research Center, Nanhu Laboratory, Jiaxing, China
Pseudocode	Yes	Algorithm 1 Shapley-Guided Quantization with Corrected Hessian Algorithm 2 Parameter-wise Shapley Value Estimation with Gradient Similarity
Open Source Code	Yes	https://github.com/Artessay/Model Shapley
Open Datasets	Yes	Datasets. We utilize a grade school math dataset GSM8K and a multitask language understanding dataset MMLU for NLP tasks to evaluate language transformer models, and use image classification datasets CIFAR-100 and Image Net for vision transformer models in CV tasks.
Dataset Splits	Yes	In the experimental data processing phase, we strictly adhere to the original training-test set splits provided for each dataset to ensure the reproducibility of results and comparability with prior studies. Specifically, for the original training set of each dataset, we further employ stratified random sampling to partition it into a training subset and a validation subset at an 80%:20% ratio.
Hardware Specification	Yes	All NLP tasks and CV tasks are tested on an Ubuntu server equipped with 8 NVIDIA A100 GPUs with 80GB memory, and quantization tasks are tested on an Ubuntu server equipped with 8 NVIDIA 4090 GPUs with 48GB memory.
Software Dependencies	Yes	We use Py Torch 2.6.0 library to implement all the algorithms based on the open-source Hugging Face transformers [67] codebase.
Experiment Setup	Yes	For the training experiments, The activative ratio is set to 0.1 for all tasks. We conduct NLP tasks with a learning rate of 3e-5, a max gradient norm of 1.0, a warmup ratio of 1e-3, a batch size of 16 for GSM8K dataset and 8 for MMLU dataset, a gradient accumulation steps of 4, a cutoff length of 1024 for GSM8K dataset and 2048 for MMLU dataset, a max respoonse length of 1024, and 1 epochs. And CV tasks are conducted with a learning rate of 1e-5, a weight decay of 1e-4, a batch size of 512, max epochs of 50 and use early stop strategy with patience of 5 epochs.