Model Shapley: Equitable Model Valuation with Black-box Access

Authors: Xinyi Xu, Thanh Lam, Chuan Sheng Foo, Bryan Kian Hsiang Low

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive empirical validation on the effectiveness of model Shapley using various real-world datasets and heterogeneous model types. Our implementation trains a GPR (as the model appraiser) on the MSVs of a subset of N = 150 models and examines its predictive performance on the remaining ones.
Researcher Affiliation Academia Dept. of Computer Science, National University of Singapore, Republic of Singapore Inst. for Infocomm Research; Centre for Frontier AI Research, A STAR, Republic of Singapore
Pseudocode No The paper does not contain any pseudocode blocks or clearly labeled algorithm sections.
Open Source Code Yes Our implementation is available at https://github.com/Xinyi YS/Model Shapley.
Open Datasets Yes We train N = 150 independent models on MNIST (CIFAR-10)...We investigate 5 real-world datasets... including MNIST, CIFAR-10 [41], two medical datasets: a drug reviews dataset... (Drug Re) [23] and a medical imaging dataset... (Med NIST) [58], and a cyber-threat detection dataset... (KDD99) [28]. We perform additional experiments on Cov Type [5], MNIST and CIFAR-100.
Dataset Splits Yes We train a GPR (as the model appraiser) on a random subset of 150 model-MSV pairs to learn to predict the MSV on the remaining pairs. We examine the test performance using two error metrics: mean-squared error (MSE) and maximum error (Max E) w.r.t. varied training ratios from 5% to 50%, in Fig. 2. In particular, results for training ratio of 20% are in Table 2.
Hardware Specification Yes We perform our experiments on a server with Intel(R) Xeon(R) Gold 6226R CPU @2.90GHz and four NVIDIA Ge Force RTX 3080 s.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used in the experiments. It mentions an 'automatic differentiation package' in a footnote but without a version.
Experiment Setup No The paper describes aspects of the experimental setup, such as model types used (e.g., LR, MLP, CNN, ResNet-18, SqueezeNet, DenseNet-121), the kernel for GPR (squared exponential), and data manipulation (e.g., multiplying probability by a factor). However, it lacks crucial hyperparameters for the training of the underlying N=150 models (e.g., learning rates, batch sizes, optimizers, specific epoch counts), which are essential for full reproducibility of the models themselves.