Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Incentives in Private Collaborative Machine Learning
Authors: Rachael Sim, Yehong Zhang, Nghia Hoang, Xinyi Xu, Bryan Kian Hsiang Low, Patrick Jaillet
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets. This section empirically evaluates the privacy-valuation and privacy-reward trade-offs (Sec. 6.1), reward control mechanisms (Sec. 6.2), and their relationship with the utility of the model rewards (Sec. 6.3). The time complexity of our scheme is analyzed in App. F and baseline methods are discussed in App. H.3. We consider Bayesian linear regression (BLR) with unknown variance on the Syn and Cali H datasets, and Bayesian logistic regression on the Diab dataset with 3 collaborating parties (see App. H.1 for details) and enforce (2, ϵi)-Rényi DP. |
| Researcher Affiliation | Collaboration | 1 Department of Computer Science, National University of Singapore, Republic of Singapore 2 Peng Cheng Laboratory, People s Republic of China 3 School of Electrical Engineering and Computer Science, Washington State University, USA 4 Dept. of Electrical Engineering and Computer Science, MIT, USA |
| Pseudocode | Yes | Algorithm 1 BLR Gibbs sampler [4] from noise-aware posterior p(θ|ON = o N) R Qi N [p(oi|si) p(si|θ)] p(θ) ds1 dsn. The algorithm (repeatedly) sample the latent variables Si, ω and θ sequentially. Algorithm 2 An overview of our collaborative ML problem setup. The computational complexity is given in App. F. |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code for the described methodology. |
| Open Datasets | Yes | We consider Bayesian linear regression (BLR) with unknown variance on the Syn and Cali H datasets, and Bayesian logistic regression on the Diab dataset with 3 collaborating parties (see App. H.1 for details) and enforce (2, ϵi)-Rényi DP. For Californian Housing dataset (Cali H) [44],... For PIMA Indian Diabetes classification dataset (Diab) [50], |
| Dataset Splits | Yes | We split the training and the validation set using an 80-20 split. There are 614 training data points. There are 35.6% and 31.8% of patients with diabetes in the training and validation sets, respectively. |
| Hardware Specification | Yes | The experiments are performed on a machine with Ubuntu 20.04 LTS, 2 Intel Xeon Gold 6230 (2.1GHz) without GPU. |
| Software Dependencies | No | The software environments used are Miniconda and Python. A full list of packages used is given in the file environment.yml attached. |
| Experiment Setup | Yes | The normal inverse-gamma distribution used (i) to generate the true regression model weights, variance, and a 2D dataset and (ii) as our model prior is as follows: σ2 Inv Gamma(α0 = 5, β0 = 0.1) where α0 and β0 are, respectively, the inverse-gamma shape and scale parameters, and w|σ2 N(0, σ2Λ 1 0 ) where Λ0 = 0.025 I. We consider three parties 1, 2, and 3 with c0 = 100, c1 = 200, and c2 = 400 data points, respectively. We fix ϵ1 = ϵ3 = 0.2 and vary ϵ2 from the default 0.1. One posterior sampling run generates 16 Gibbs sampling chains in parallel. For each chain, we discard the first 10000 burn-in samples and draw m = 30000 samples. To reduce the closeness/correlation between samples which will affect the nearest-neighbor-based KL estimation, we thin and only keep every 16-th sample and concatenate the thinned samples across all 16 chains. |