Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Localized Data Shapley: Accelerating Valuation for Nearest Neighbor Algorithms
Authors: Guangyi Zhang, Yanhao Wang, Chengliang Chai, Qiyu Liu, Wei Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real-life datasets demonstrate that our methods achieve a substantial speedup compared to previous approaches. |
| Researcher Affiliation | Academia | Guangyi Zhang Shenzhen Technology University EMAIL Yanhao Wang East China Normal University EMAIL Chengliang Chai Beijing Institute of Technology EMAIL Qiyu Liu Southwest University EMAIL Wei Wang HKUST(GZ) and HKUST EMAIL |
| Pseudocode | Yes | Algorithm 1: Fast Data Shapley Value Computation for Threshold-based KNN |
| Open Source Code | Yes | Our source code is published for reproducibility.3 3https://github.com/Guangyi-Zhang/tknn-data-shapley |
| Open Datasets | Yes | Datasets. We used both synthetic and real-world datasets in the experiments. ... We select a collection of real-world datasets as listed in Table A1. ... All the data are publicly available. |
| Dataset Splits | Yes | The dataset size |D| ranges from 10 K to 1 M, and we set |Dtest| to be 0.2%-1% of |D|. Thus, the total size of |D| |Dtest| is up to the order of 1010. |
| Hardware Specification | Yes | All experiments were carried out on a Linux server equipped with 64 CPUs of Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60 GHz and 1511 GB RAM. |
| Software Dependencies | Yes | All algorithms were implemented in Python 3.11. |
| Experiment Setup | Yes | The default values for the parameters are K = 5, n L = 50, τ/d = 0.2, σ = 0.1, and FFT for landmark selection. |