reproducibilityindex.ai

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

Authors: Felipe Garrido Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the benefits of DU-Shapley by measuring numerically three properties: (1) how well DU-Shapley approximates the Shapley value in real data, (2) how many (theoretical) iterations need other methods to achieve the same accuracy level than DU-Shapley, and (3) how well DU-Shapley performs in classical dataset valuation tasks with real data.
Researcher Affiliation	Collaboration	Felipe Garrido-Lucero* Inria, Fairplay joint team Palaiseau, France felipe.garrido-lucero@irit.fr Benjamin Heymann* Criteo AI Lab Paris, France b.heymann@criteo.com Maxime Vono* Criteo AI Lab Paris, France m.vono@criteo.com Patrick Loiseau Inria, Fairplay joint team Palaiseau, France patrick.loiseau@inria.fr Vianney Perchet ENSAE, Fair Play joint team Palaiseau, France vianney@ensae.fr
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The codes are included and the implementations are detailed in the appendix.
Open Datasets	Yes	We consider the real-world datasets in Mitchell et al. [32], whose details are provided in Table 3 in the appendix. To tackle these problems we consider logistic regression models and gradient-boosted decision trees (GBDT). For classification tasks, the utility function has been taken as the expected accuracy of the trained logistic regression model over a hold-out testing set while for regression tasks, the utility function corresponds to the averaged MSE over a hold-out testing set. In both cases we took a hold-out testing set with 10% of the size of the training dataset. and Table 3: Datasets considered in Section 4.1. Dataset Size d Task adult [21] 48,842 107 classification breast-cancer [30] 699 30 classification bank [33] 45,211 16 classification cal-housing [19] 20,640 8 regression make-regression [36] 1,000 10 regression year [36] 515,345 90 regression
Dataset Splits	No	In both cases we took a hold-out testing set with 10% of the size of the training dataset. The paper mentions a test set percentage but does not specify training/validation splits or their percentages, or how they are derived if a validation set is used implicitly.
Hardware Specification	Yes	All experiments were executed on a laptop running mac OS 13.3.1 and equipped with Apple M1 chip with 16GB of RAM.
Software Dependencies	No	The paper mentions general software types and models used but does not provide specific version numbers for libraries or frameworks (e.g., Python, PyTorch, scikit-learn versions).
Experiment Setup	Yes	Since computing the marginal contributions in this experiment requires re-training, which is clearly not feasible for a large number of epochs, we chose to restrict ourselves to 20 steps of stochastic gradient descent for logistic regression and 20 boosting iterations for GBDTs.