Model Shapley: Equitable Model Valuation with Black-box Access
Authors: Xinyi Xu, Thanh Lam, Chuan Sheng Foo, Bryan Kian Hsiang Low
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive empirical validation on the effectiveness of model Shapley using various real-world datasets and heterogeneous model types. Our implementation trains a GPR (as the model appraiser) on the MSVs of a subset of N = 150 models and examines its predictive performance on the remaining ones. |
| Researcher Affiliation | Academia | Dept. of Computer Science, National University of Singapore, Republic of Singapore Inst. for Infocomm Research; Centre for Frontier AI Research, A STAR, Republic of Singapore |
| Pseudocode | No | The paper does not contain any pseudocode blocks or clearly labeled algorithm sections. |
| Open Source Code | Yes | Our implementation is available at https://github.com/Xinyi YS/Model Shapley. |
| Open Datasets | Yes | We train N = 150 independent models on MNIST (CIFAR-10)...We investigate 5 real-world datasets... including MNIST, CIFAR-10 [41], two medical datasets: a drug reviews dataset... (Drug Re) [23] and a medical imaging dataset... (Med NIST) [58], and a cyber-threat detection dataset... (KDD99) [28]. We perform additional experiments on Cov Type [5], MNIST and CIFAR-100. |
| Dataset Splits | Yes | We train a GPR (as the model appraiser) on a random subset of 150 model-MSV pairs to learn to predict the MSV on the remaining pairs. We examine the test performance using two error metrics: mean-squared error (MSE) and maximum error (Max E) w.r.t. varied training ratios from 5% to 50%, in Fig. 2. In particular, results for training ratio of 20% are in Table 2. |
| Hardware Specification | Yes | We perform our experiments on a server with Intel(R) Xeon(R) Gold 6226R CPU @2.90GHz and four NVIDIA Ge Force RTX 3080 s. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used in the experiments. It mentions an 'automatic differentiation package' in a footnote but without a version. |
| Experiment Setup | No | The paper describes aspects of the experimental setup, such as model types used (e.g., LR, MLP, CNN, ResNet-18, SqueezeNet, DenseNet-121), the kernel for GPR (squared exponential), and data manipulation (e.g., multiplying probability by a factor). However, it lacks crucial hyperparameters for the training of the underlying N=150 models (e.g., learning rates, batch sizes, optimizers, specific epoch counts), which are essential for full reproducibility of the models themselves. |