Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Localized Data Shapley: Accelerating Valuation for Nearest Neighbor Algorithms

Authors: Guangyi Zhang, Yanhao Wang, Chengliang Chai, Qiyu Liu, Wei Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on real-life datasets demonstrate that our methods achieve a substantial speedup compared to previous approaches.
Researcher Affiliation Academia Guangyi Zhang Shenzhen Technology University EMAIL Yanhao Wang East China Normal University EMAIL Chengliang Chai Beijing Institute of Technology EMAIL Qiyu Liu Southwest University EMAIL Wei Wang HKUST(GZ) and HKUST EMAIL
Pseudocode Yes Algorithm 1: Fast Data Shapley Value Computation for Threshold-based KNN
Open Source Code Yes Our source code is published for reproducibility.3 3https://github.com/Guangyi-Zhang/tknn-data-shapley
Open Datasets Yes Datasets. We used both synthetic and real-world datasets in the experiments. ... We select a collection of real-world datasets as listed in Table A1. ... All the data are publicly available.
Dataset Splits Yes The dataset size |D| ranges from 10 K to 1 M, and we set |Dtest| to be 0.2%-1% of |D|. Thus, the total size of |D| |Dtest| is up to the order of 1010.
Hardware Specification Yes All experiments were carried out on a Linux server equipped with 64 CPUs of Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60 GHz and 1511 GB RAM.
Software Dependencies Yes All algorithms were implemented in Python 3.11.
Experiment Setup Yes The default values for the parameters are K = 5, n L = 50, τ/d = 0.2, σ = 0.1, and FFT for landmark selection.