Privacy Implications of Shuffling

Authors: Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate on four datasets. We are not aware of any prior work that provides comparable local inferential privacy. Hence, we baseline our mechanism with the two extremes: standard LDP and uniform random shuffling. For concreteness, we detail our procedure with the PUDF dataset (PUD) (license), which comprises n ≈ 29k psychiatric patient records from Texas. Each data owner s sensitive value xi is their medical payment method, which is reflective of socioeconomic class (such as medicaid or charity). Public auxiliary information t ∈ T is the hospital s geolocation.
Researcher Affiliation Academia Casey Meehan 1, Amrita Roy-Chowdhury 2, Kamalika Chaudhuri 1, Somesh Jha 2 1UC San Diego, 2 University of Wisconsin, Madison
Pseudocode Yes Algorithm 1: dσ-private Shuffling Mech.
Open Source Code Yes A .zip file demonstrating code of each experiment has been uploaded as supplementary material.
Open Datasets Yes We evaluate on four datasets. ... PUDF dataset (PUD) (license), ... Twitch (Rozemberczki et al., 2019). ... Adult (Dua & Graff, 2017).
Dataset Splits No The paper describes using 'an equal sized test set' but does not specify explicit training, validation, or test split percentages or exact counts needed to reproduce the data partitioning. It also doesn't mention a separate validation set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'gradient boosted decision tree (GBDT) model (Friedman, 2001)' and 'Platt scaling (Niculescu-Mizil & Caruana, 2005)', but these are references to methods/papers, not specific software libraries with version numbers.
Experiment Setup No The paper mentions 'Using an ϵ = 2.5 randomized response mechanism, we resample the LDP sequence y 50 times' and 'We implement Cal as a gradient boosted decision tree (GBDT) model', but lacks specific hyperparameters (e.g., learning rate, batch size) or detailed configuration settings for these models or the overall experimental setup.