Validation Free and Replication Robust Volume-based Data Valuation

Authors: Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo, Bryan Kian Hsiang Low

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments to demonstrate its consistency in valuation and practical advantages over existing baselines and show that our method is modeland task-agnostic and can be flexibly adapted to handle various neural networks.
Researcher Affiliation Academia Dept. of Computer Science, National University of Singapore, Republic of Singapore; Institute of Data Science, National University of Singapore, Republic of Singapore; Integrative Sciences and Engineering Programme, NUSGS, Republic of Singapore; Institute for Infocomm Research, A*STAR, Republic of Singapore
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes Our code is publicly available at: https://github.com/Zhaoxuan Wu/Volume Based-Data Valuation.
Open Datasets Yes We use two real-world datasets: credit card fraud detection [2] (i.e., transaction amount prediction) and Uber & Lyft [5] (i.e., carpool ride price prediction)... UK used car dataset [1] (i.e., car price prediction) and credit card fraud detection dataset [2] (i.e., transaction amount prediction), Trip Advisor hotel reviews dataset [4], California housing price prediction (Cali H) [20], Kings county housing sales prediction (King H) [3], US census income prediction (USCensus) [6], and age estimation from facial images (Face A) [41].
Dataset Splits Yes We use 60% of data to construct XS1, XS2, and XS3 and the remaining 40% as the validation set for LOO and VLSV.
Hardware Specification Yes All experiments have been run on a server with Intel(R) Xeon(R)@ 2.70GHz processor and 256GB RAM.
Software Dependencies No No specific software dependencies with version numbers were explicitly mentioned.
Experiment Setup Yes The input features are standardized and we set ω = 0.1. For Cali H, we use the latent features from the last layer of a neural network with 2 fully connected layers of 64 and 10 hidden units and the rectified linear unit (Re LU) as the activation function.