CaPC Learning: Confidential and Private Collaborative Learning

Authors: Christopher A. Choquette-Choo, Natalie Dullerud, Adam Dziedzic, Yunxiang Zhang, Somesh Jha, Nicolas Papernot, Xiao Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on SVHN and CIFAR10 demonstrate that Ca PC enables participants to collaborate and improve the utility of their models, even in the heterogeneous setting where the architectures of their local models differ, and when there are only a few participants.
Researcher Affiliation Collaboration University of Toronto and Vector Institute {christopher.choquette.choo,natalie.dullerud}@mail.utoronto.ca ady@vectorinstitute.ai Yunxiang Zhang The Chinese University of Hong Kong yunxiang.zhang@ie.cuhk.edu.hk Somesh Jha University of Wisconsin-Madison and Xai Pient jha@cs.wisc.edu Nicolas Papernot University of Toronto and Vector Institute nicolas.papernot@utoronto.ca Xiao Wang Northwestern University wangxiao@cs.northwestern.edu
Pseudocode No The paper includes a protocol description in Figure 1, but it is a diagrammatic representation of steps, not structured pseudocode or an algorithm block.
Open Source Code Yes 1 Code is available at: https://github.com/cleverhans-lab/capc-iclr.
Open Datasets Yes Our experiments on SVHN and CIFAR10 demonstrate that Ca PC enables participants to collaborate and improve the utility of their models... We use the following for experiments unless otherwise noted. We uniformly sample from the training set in use2, without replacement, to create disjoint partitions, Di, of equal size and identical data distribution for each party... We select K = 50 and K = 250 as the number of parties for CIFAR10 and SVHN, respectively (the number is larger for SVHN because we have more data).
Dataset Splits No The paper mentions training and testing, but does not explicitly provide details about a distinct validation split (percentages, counts, or methodology).
Hardware Specification No The paper mentions 'CPU' and 'GPU' in Table 1 and states 'HE-transformer only supports inference on CPUs', but it does not specify exact CPU or GPU models or other detailed hardware specifications used for experiments.
Software Dependencies No The paper mentions using 'HE-transformer library with MPC (MP2ML)' and 'The EMP toolkit' along with their respective citations, but it does not provide specific version numbers for these software components.
Experiment Setup Yes We use the following for experiments unless otherwise noted. We uniformly sample from the training set in use2, without replacement, to create disjoint partitions, Di, of equal size and identical data distribution for each party. We select K = 50 and K = 250 as the number of parties for CIFAR10 and SVHN, respectively (the number is larger for SVHN because we have more data). We select Q = 3 querying parties, Pi , and similarly divide part of the test set into Q separate private pools for each Pi to select queries, until their privacy budget of ϵ is reached (using Gaussian noise with σ = 40 on SVHN and 7 on CIFAR10). We fix ϵ = 2 and 20 for SVHN and CIFAR10, respectively (which leads to 550 queries per party), and report accuracy on the evaluation set. Querying models are retrained on their Di plus the newly labelled data; the difference in accuracies is their accuracy improvement. We use shallower variants of VGG, namely VGG-5 and VGG-7 for CIFAR10 and SVHN, respectively, to accommodate the small size of each party s private dataset.