Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate

Authors: James Jordon, Jinsung Yoon, Mihaela van der Schaar

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the improvements our model makes over standard subsample-and-aggregate in two datasets (Heart Failure (private) and UCI Adult (public)).
Researcher Affiliation Academia James Jordon University of Oxford EMAIL Jinsung Yoon University of California, Los Angeles EMAIL Mihaela van der Schaar University of Cambridge University of California, Los Angeles Alan Turing Institute EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Semi-supervised differentially private knowledge transfer using multiple partitions
Open Source Code Yes Implementation of DPBag can be found at https://bitbucket.org/mvdschaar/mlforhealthlabpub/src/master/alg/dpbag/.
Open Datasets Yes We demonstrate the improvements our model makes over standard subsample-and-aggregate in two datasets (Heart Failure (private) and UCI Adult (public)).
Dataset Splits Yes We randomly divide the data into 3 disjoint subsets: (1) a training set (33%), (2) public data (33%), (3) a testing set (33%).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions using 'logistic regression' and 'Gradient Boosting Method (GBM)' as models but does not specify software names with version numbers for implementation or dependencies.
Experiment Setup Yes We set δ = 10 5. We vary ϵ {1, 3, 5}, n {50, 100, 250} and k {10, 50, 100}. In all cases we set λ = 2 n.