Validation Free and Replication Robust Volume-based Data Valuation
Authors: Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo, Bryan Kian Hsiang Low
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments to demonstrate its consistency in valuation and practical advantages over existing baselines and show that our method is modeland task-agnostic and can be flexibly adapted to handle various neural networks. |
| Researcher Affiliation | Academia | Dept. of Computer Science, National University of Singapore, Republic of Singapore; Institute of Data Science, National University of Singapore, Republic of Singapore; Integrative Sciences and Engineering Programme, NUSGS, Republic of Singapore; Institute for Infocomm Research, A*STAR, Republic of Singapore |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our code is publicly available at: https://github.com/Zhaoxuan Wu/Volume Based-Data Valuation. |
| Open Datasets | Yes | We use two real-world datasets: credit card fraud detection [2] (i.e., transaction amount prediction) and Uber & Lyft [5] (i.e., carpool ride price prediction)... UK used car dataset [1] (i.e., car price prediction) and credit card fraud detection dataset [2] (i.e., transaction amount prediction), Trip Advisor hotel reviews dataset [4], California housing price prediction (Cali H) [20], Kings county housing sales prediction (King H) [3], US census income prediction (USCensus) [6], and age estimation from facial images (Face A) [41]. |
| Dataset Splits | Yes | We use 60% of data to construct XS1, XS2, and XS3 and the remaining 40% as the validation set for LOO and VLSV. |
| Hardware Specification | Yes | All experiments have been run on a server with Intel(R) Xeon(R)@ 2.70GHz processor and 256GB RAM. |
| Software Dependencies | No | No specific software dependencies with version numbers were explicitly mentioned. |
| Experiment Setup | Yes | The input features are standardized and we set ω = 0.1. For Cali H, we use the latent features from the last layer of a neural network with 2 fully connected layers of 64 and 10 hidden units and the rectified linear unit (Re LU) as the activation function. |