reproducibilityindex.ai

Validation Free and Replication Robust Volume-based Data Valuation

Authors: Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo, Bryan Kian Hsiang Low

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments to demonstrate its consistency in valuation and practical advantages over existing baselines and show that our method is modeland task-agnostic and can be ﬂexibly adapted to handle various neural networks.
Researcher Affiliation	Academia	Dept. of Computer Science, National University of Singapore, Republic of Singapore; Institute of Data Science, National University of Singapore, Republic of Singapore; Integrative Sciences and Engineering Programme, NUSGS, Republic of Singapore; Institute for Infocomm Research, A*STAR, Republic of Singapore
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	Yes	Our code is publicly available at: https://github.com/Zhaoxuan Wu/Volume Based-Data Valuation.
Open Datasets	Yes	We use two real-world datasets: credit card fraud detection [2] (i.e., transaction amount prediction) and Uber & Lyft [5] (i.e., carpool ride price prediction)... UK used car dataset [1] (i.e., car price prediction) and credit card fraud detection dataset [2] (i.e., transaction amount prediction), Trip Advisor hotel reviews dataset [4], California housing price prediction (Cali H) [20], Kings county housing sales prediction (King H) [3], US census income prediction (USCensus) [6], and age estimation from facial images (Face A) [41].
Dataset Splits	Yes	We use 60% of data to construct XS1, XS2, and XS3 and the remaining 40% as the validation set for LOO and VLSV.
Hardware Specification	Yes	All experiments have been run on a server with Intel(R) Xeon(R)@ 2.70GHz processor and 256GB RAM.
Software Dependencies	No	No specific software dependencies with version numbers were explicitly mentioned.
Experiment Setup	Yes	The input features are standardized and we set ω = 0.1. For Cali H, we use the latent features from the last layer of a neural network with 2 fully connected layers of 64 and 10 hidden units and the rectiﬁed linear unit (Re LU) as the activation function.