Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Authors: Junyuan Hong, Lingjuan Lyu, Jiayu Zhou, Michael Spranger

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies show that the proposed ECOS improves the quality of automated client labeling, model compression, and label outsourcing when applied in various learning scenarios.
Researcher Affiliation Collaboration Junyuan Hong Michigan State University hongju12@msu.edu Lingjuan Lyu Sony AI lingjuan.lv@sony.com Jiayu Zhou Michigan State University jiayuz@msu.edu Michael Spranger Sony AI michael.spranger@sony.com
Pseudocode Yes Algorithm 1 Efficient collaborative open-source sampling (ECOS)
Open Source Code No We include the instructions but not codes for reproducing results.
Open Datasets Yes We use datasets from two tasks: digit recognition and object recognition. Distinct from prior work [56], in our work, the open-source data contains samples out of the client s distribution. With the same classes as the client dataset, we assume open-source data are from different environments and therefore include different feature distributions, for example, Domain Net [41] and Digits [28].
Dataset Splits Yes Splits of client and cloud datasets. For Digits, we use one domain for the client and the rest domains for the cloud as open-source set. For Domain Net, we randomly select 50% samples from one domain for the client and leave the rest samples together with all other domains to the cloud. Each experiment case is repeated for three times with seeds {1, 2, 3}.
Hardware Specification No The paper mentions 'powerful cloud server' and 'low-power and cost-effective end device' but does not specify any exact GPU or CPU models, or detailed cloud resource types used for the experiments in the main text.
Software Dependencies No The paper refers to various models and methods like Fix Match, KMeans, ResNet50, and private kNN, but it does not specify software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with their respective version numbers.
Experiment Setup Yes Details of hyper-parameters are deferred to Appendix B.1. on the selected samples, we train a linear classifier head for 30 epochs under the supervision of true labels and the teacher model ft, and then fine-tune the full network fs for 500 epochs.