Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Validation Free and Replication Robust Volume-based Data Valuation
Authors: Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo, Bryan Kian Hsiang Low
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments to demonstrate its consistency in valuation and practical advantages over existing baselines and show that our method is modeland task-agnostic and can be flexibly adapted to handle various neural networks. |
| Researcher Affiliation | Academia | Dept. of Computer Science, National University of Singapore, Republic of Singapore; Institute of Data Science, National University of Singapore, Republic of Singapore; Integrative Sciences and Engineering Programme, NUSGS, Republic of Singapore; Institute for Infocomm Research, A*STAR, Republic of Singapore |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our code is publicly available at: https://github.com/Zhaoxuan Wu/Volume Based-Data Valuation. |
| Open Datasets | Yes | We use two real-world datasets: credit card fraud detection [2] (i.e., transaction amount prediction) and Uber & Lyft [5] (i.e., carpool ride price prediction)... UK used car dataset [1] (i.e., car price prediction) and credit card fraud detection dataset [2] (i.e., transaction amount prediction), Trip Advisor hotel reviews dataset [4], California housing price prediction (Cali H) [20], Kings county housing sales prediction (King H) [3], US census income prediction (USCensus) [6], and age estimation from facial images (Face A) [41]. |
| Dataset Splits | Yes | We use 60% of data to construct XS1, XS2, and XS3 and the remaining 40% as the validation set for LOO and VLSV. |
| Hardware Specification | Yes | All experiments have been run on a server with Intel(R) Xeon(R)@ 2.70GHz processor and 256GB RAM. |
| Software Dependencies | No | No specific software dependencies with version numbers were explicitly mentioned. |
| Experiment Setup | Yes | The input features are standardized and we set ω = 0.1. For Cali H, we use the latent features from the last layer of a neural network with 2 fully connected layers of 64 and 10 hidden units and the rectified linear unit (Re LU) as the activation function. |