reproducibilityindex.ai

Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards

Authors: Sebastian Shenghong Tay, Xinyi Xu, Chuan Sheng Foo, Bryan Kian Hsiang Low9448-9456

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data to a pool for training a generative model (e.g., GAN), from which synthetic data are drawn and distributed to the parties as rewards commensurate to their contributions. We empirically show using simulated and real-world datasets that the parties synthetic data rewards are commensurate to their contributions. This section empirically evaluates the performance of our CGM framework using simulated and real-world datasets
Researcher Affiliation	Collaboration	Sebastian Shenghong Tay1,2, Xinyi Xu1,2, Chuan Sheng Foo2, Bryan Kian Hsiang Low1 1Department of Computer Science, National University of Singapore, Singapore 2Institute for Infocomm Research, A*STAR, Singapore
Pseudocode	No	The paper refers to 'Algo. 1 in (Tay et al. 2021)' within the text, implying the existence of an algorithm block in an extended version of the paper. However, this specific document does not contain a clearly labeled algorithm or pseudocode block.
Open Source Code	No	The paper does not provide any specific link or statement about releasing its own source code for the methodology described.
Open Datasets	Yes	We use the real-world credit card (CC) fraud dataset (Dal Pozzolo et al. 2015) containing European credit card transactions... using the real-world MNIST (Le Cun et al. 1998) and CIFAR10 (Krizhevsky 2009) image datasets as surrogates.
Dataset Splits	No	The paper describes how data is split among parties ('equal disjoint' and 'unequal' splits) but does not specify distinct training, validation, and test splits with percentages, counts, or references to predefined validation sets for model training and evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with their version numbers (e.g., specific libraries, frameworks, or programming language versions).
Experiment Setup	Yes	We use the squared exponential kernel with its length-scale computed using the binary search algorithm. Our full CGM framework, which includes computing the normalized Shapley values α1, . . . , αn... solving the LP... and running the weighted sampling algorithm... is applied across all datasets and splits. We simulate supervised learning scenarios where each party trains an SVM on its real and synthetic data and predicts the class labels on unseen real data.