Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Shapley-Based Data Valuation for Weighted $k$-Nearest Neighbors

Authors: Guangyi Zhang, Qiyu Liu, Aristides Gionis

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate the following research questions in the experiments: (1) How does Algorithm 1 perform compared with the existing methods? We study this question with a task of noisy label detection. See Section 7.1. (2) How does the Dk NN-SV deviate from those of the unweighted and weighted k NN-SV formulations? We visualize and compare them in Section 7.2. (3) How is the scalability of Algorithm 1? We study this in Fig. 1b and further in Appendix B.3. (4) What is the effect of the parameters on Algorithm 1? We study this in Fig. 1c and more in Appendix B.4.
Researcher Affiliation	Academia	Guangyi Zhang Shenzhen Technology University EMAIL Qiyu Liu Southwest University EMAIL Aristides Gionis KTH Royal Institute of Technology and Digital Futures EMAIL
Pseudocode	Yes	Algorithm 1: Fast algorithm for duplicate-based weighted k NN-SV
Open Source Code	Yes	Our code can be found at a Github repository.2https://github.com/Guangyi-Zhang/weighted-knnsv-via-duplication
Open Datasets	Yes	Datasets. We evaluate the proposed methods on 11 datasets, whose statistics are listed in Table A1. The size of the datasets ranges from 5K to 1M. Many of them are chosen to be of a moderate size, so as to allow us to compare with more costly baselines.
Dataset Splits	Yes	By default, we randomly select 1% the data up to 100 points as the testing set. We randomly select 5% of the training data as a validation set. We train a weighted k NN classifier on the rest of the data, and evaluate its accuracy on the validation set.
Hardware Specification	Yes	All algorithms were implemented in Python 3.11. All experiments were carried out on a Linux server equipped with 64 CPUs of Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60 GHz and 1511 GB RAM.
Software Dependencies	Yes	All algorithms were implemented in Python 3.11.
Experiment Setup	Yes	To evaluate the performance of different methods, we follow the setup in previous works [8, 13, 10] and use the task of noisy label detection. For each dataset, we randomly flip the labels of 5% of the training data points, which forms a noisy subset of size n/20. We predict the noisy subset by the top-t data points with the lowest Shapley values. We set t = 500 for all methods. Intuitively, stronger data valuation methods should be able to detect noisy data points more accurately. We tune the key parameter, kernel width σ, of all algorithms that use a kernel function as follows. We randomly select 5% of the training data as a validation set. We train a weighted k NN classifier on the rest of the data, and evaluate its accuracy on the validation set. We select the value σ that gives the highest accuracy from a list of candidates.