Robust Data Valuation with Weighted Banzhaf Values
Authors: Weida Li, Yaoliang Yu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical study shows that the Banzhaf value is not always the most robust when compared with a broader family: weighted Banzhaf values. To analyze this scenario, we introduce the concept of Kronecker noise to parameterize stochasticity, through which we prove that the uniquely robust semi-value, which can be analytically derived from the underlying Kronecker noise, lies in the family of weighted Banzhaf values while minimizing the worst-case entropy. In addition, we adopt the maximum sample reuse principle to design an estimator to efficiently approximate weighted Banzhaf values, and show that it enjoys the best time complexity in terms of achieving an (ϵ, δ)-approximation. Our theory is verified under both synthetic and authentic noises. For the latter, we fit a Kronecker noise to the inherent stochasticity, which is then plugged in to generate the predicted most robust semi-value. Our study suggests that weighted Banzhaf values are promising when facing undue noises in data valuation. |
| Researcher Affiliation | Academia | Weida Li vidaslee@gmail.com Yaoliang Yu School of Computer Science University of Waterloo Vector Institute yaoliang.yu@uwaterloo.ca |
| Pseudocode | No | The paper describes the estimation process using mathematical formulas (e.g., Eq. 4) and prose, but it does not include a clearly labeled pseudocode block or algorithm section. |
| Open Source Code | Yes | Our code is available at https://github.com/watml/weighted-Banzhaf. |
| Open Datasets | Yes | All datasets used are from open sources, and are classification tasks. Except for MNIST and FMNIST, each Dtr or Dval is balanced between different classes. Without explicitly stated, we set |Dval| = 200. All utility functions are set to be the accuracy reported on Dval with logistic regression models being trained on Dtr, except that we implement Le Net (Le Cun et al. 1998) for MNIST and FMNIST. (...) The datasets we use in the main paper are summarized in Table 4. |
| Dataset Splits | Yes | Let Dtr and Dval be training and validation datasets, respectively. We write n = |Dtr| (...) Without explicitly stated, we set |Dval| = 200. All utility functions are set to be the accuracy reported on Dval with logistic regression models being trained on Dtr, except that we implement Le Net (Le Cun et al. 1998) for MNIST and FMNIST. (...) We fix |Dtr| = 1, 000 for all datasets except that it is |Dtr| = 2, 000 for MNIST and FMNIST. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing 'Le Net (Le Cun et al. 1998) for MNIST and FMNIST' and 'logistic regression models'. However, it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | All utility functions are set to be the accuracy reported on Dval with logistic regression models being trained on Dtr, except that we implement Le Net (Le Cun et al. 1998) for MNIST and FMNIST. To have the merit of efficiency, we adopt one-epoch onemini-batch learning for training models in all types of experiments (Ghorbani and Zou 2019). (...) Besides, the learning rate is set to be 1.0. (...) The learning rate is set to be 0.05. (...) The total number of utility evaluations is set to be 400, 000. (...) For each dataset, we randomly flip the labels of 20 percent of data in Dtr to be any of the rest in a uniform manner. |