DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation
Authors: Felipe Garrido Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the benefits of DU-Shapley by measuring numerically three properties: (1) how well DU-Shapley approximates the Shapley value in real data, (2) how many (theoretical) iterations need other methods to achieve the same accuracy level than DU-Shapley, and (3) how well DU-Shapley performs in classical dataset valuation tasks with real data. |
| Researcher Affiliation | Collaboration | Felipe Garrido-Lucero* Inria, Fairplay joint team Palaiseau, France felipe.garrido-lucero@irit.fr Benjamin Heymann* Criteo AI Lab Paris, France b.heymann@criteo.com Maxime Vono* Criteo AI Lab Paris, France m.vono@criteo.com Patrick Loiseau Inria, Fairplay joint team Palaiseau, France patrick.loiseau@inria.fr Vianney Perchet ENSAE, Fair Play joint team Palaiseau, France vianney@ensae.fr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The codes are included and the implementations are detailed in the appendix. |
| Open Datasets | Yes | We consider the real-world datasets in Mitchell et al. [32], whose details are provided in Table 3 in the appendix. To tackle these problems we consider logistic regression models and gradient-boosted decision trees (GBDT). For classification tasks, the utility function has been taken as the expected accuracy of the trained logistic regression model over a hold-out testing set while for regression tasks, the utility function corresponds to the averaged MSE over a hold-out testing set. In both cases we took a hold-out testing set with 10% of the size of the training dataset. and Table 3: Datasets considered in Section 4.1. Dataset Size d Task adult [21] 48,842 107 classification breast-cancer [30] 699 30 classification bank [33] 45,211 16 classification cal-housing [19] 20,640 8 regression make-regression [36] 1,000 10 regression year [36] 515,345 90 regression |
| Dataset Splits | No | In both cases we took a hold-out testing set with 10% of the size of the training dataset. The paper mentions a test set percentage but does not specify training/validation splits or their percentages, or how they are derived if a validation set is used implicitly. |
| Hardware Specification | Yes | All experiments were executed on a laptop running mac OS 13.3.1 and equipped with Apple M1 chip with 16GB of RAM. |
| Software Dependencies | No | The paper mentions general software types and models used but does not provide specific version numbers for libraries or frameworks (e.g., Python, PyTorch, scikit-learn versions). |
| Experiment Setup | Yes | Since computing the marginal contributions in this experiment requires re-training, which is clearly not feasible for a large number of epochs, we chose to restrict ourselves to 20 steps of stochastic gradient descent for logistic regression and 20 boosting iterations for GBDTs. |