Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards
Authors: Sebastian Shenghong Tay, Xinyi Xu, Chuan Sheng Foo, Bryan Kian Hsiang Low9448-9456
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data to a pool for training a generative model (e.g., GAN), from which synthetic data are drawn and distributed to the parties as rewards commensurate to their contributions. We empirically show using simulated and real-world datasets that the parties synthetic data rewards are commensurate to their contributions. This section empirically evaluates the performance of our CGM framework using simulated and real-world datasets |
| Researcher Affiliation | Collaboration | Sebastian Shenghong Tay1,2, Xinyi Xu1,2, Chuan Sheng Foo2, Bryan Kian Hsiang Low1 1Department of Computer Science, National University of Singapore, Singapore 2Institute for Infocomm Research, A*STAR, Singapore |
| Pseudocode | No | The paper refers to 'Algo. 1 in (Tay et al. 2021)' within the text, implying the existence of an algorithm block in an extended version of the paper. However, this specific document does not contain a clearly labeled algorithm or pseudocode block. |
| Open Source Code | No | The paper does not provide any specific link or statement about releasing its own source code for the methodology described. |
| Open Datasets | Yes | We use the real-world credit card (CC) fraud dataset (Dal Pozzolo et al. 2015) containing European credit card transactions... using the real-world MNIST (Le Cun et al. 1998) and CIFAR10 (Krizhevsky 2009) image datasets as surrogates. |
| Dataset Splits | No | The paper describes how data is split among parties ('equal disjoint' and 'unequal' splits) but does not specify distinct training, validation, and test splits with percentages, counts, or references to predefined validation sets for model training and evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with their version numbers (e.g., specific libraries, frameworks, or programming language versions). |
| Experiment Setup | Yes | We use the squared exponential kernel with its length-scale computed using the binary search algorithm. Our full CGM framework, which includes computing the normalized Shapley values α1, . . . , αn... solving the LP... and running the weighted sampling algorithm... is applied across all datasets and splits. We simulate supervised learning scenarios where each party trains an SVM on its real and synthetic data and predicts the class labels on unseen real data. |