reproducibilityindex.ai

Privacy Implications of Shuffling

Authors: Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate on four datasets. We are not aware of any prior work that provides comparable local inferential privacy. Hence, we baseline our mechanism with the two extremes: standard LDP and uniform random shufﬂing. For concreteness, we detail our procedure with the PUDF dataset (PUD) (license), which comprises n ≈ 29k psychiatric patient records from Texas. Each data owner s sensitive value xi is their medical payment method, which is reﬂective of socioeconomic class (such as medicaid or charity). Public auxiliary information t ∈ T is the hospital s geolocation.
Researcher Affiliation	Academia	Casey Meehan 1, Amrita Roy-Chowdhury 2, Kamalika Chaudhuri 1, Somesh Jha 2 1UC San Diego, 2 University of Wisconsin, Madison
Pseudocode	Yes	Algorithm 1: dσ-private Shufﬂing Mech.
Open Source Code	Yes	A .zip ﬁle demonstrating code of each experiment has been uploaded as supplementary material.
Open Datasets	Yes	We evaluate on four datasets. ... PUDF dataset (PUD) (license), ... Twitch (Rozemberczki et al., 2019). ... Adult (Dua & Graff, 2017).
Dataset Splits	No	The paper describes using 'an equal sized test set' but does not specify explicit training, validation, or test split percentages or exact counts needed to reproduce the data partitioning. It also doesn't mention a separate validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'gradient boosted decision tree (GBDT) model (Friedman, 2001)' and 'Platt scaling (Niculescu-Mizil & Caruana, 2005)', but these are references to methods/papers, not specific software libraries with version numbers.
Experiment Setup	No	The paper mentions 'Using an ϵ = 2.5 randomized response mechanism, we resample the LDP sequence y 50 times' and 'We implement Cal as a gradient boosted decision tree (GBDT) model', but lacks specific hyperparameters (e.g., learning rate, batch size) or detailed configuration settings for these models or the overall experimental setup.