Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pure Differential Privacy for Functional Summaries with a Laplace-like Process

Authors: Haotian Lin, Matthew Reimherr

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on synthetic and real datasets demonstrate the effectiveness of the proposed mechanism. Keywords: Differential Privacy, Functional Data Analysis, Hilbert Space, Reproducing Kernel Hilbert Space, Infinite-Dimensional Statistics.
Researcher Affiliation Academia Haotian Lin EMAIL Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA Matthew Reimherr EMAIL Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
Pseudocode Yes Algorithm 1: Approximated ICLP mechanism
Open Source Code No The paper provides licensing information and a link to the JMLR paper page, but not a direct link to the source code for the methodology described in the paper.
Open Datasets Yes The first dataset is the Brain scans Diffusion Tensor Imaging (DTI) dataset. The DTI dataset provides fractional anisotropy (FA) tract profiles for the corpus callosum (CCA) of the right corticospinal tract (RCST) for patients with multiple sclerosis and for controls. Specifically, we study the CCA dataset, which includes 382 patients measured at 93 equally spaced locations of the CCA. The second dataset contains historical electricity demand in Adelaide. The dataset consists of half-hourly electricity demands from Sunday to Saturday in Adelaide between July 6, 1997, and March 31, 2007. Our analysis focuses on Monday specifically, meaning the dataset consists of measurements from 508 days at 48 equally spaced time points. ... The mortality data for each region are collected from the United Nations World Population Prospects 2019 Databases4. The dataset records the number of deaths for each region and age. ... 4. Available at https://population.un.org/wpp/Download.
Dataset Splits No The paper describes simulation experiments and real-world applications where data is used for estimation, but it does not specify explicit train/test/validation splits for machine learning model training. For evaluation, it mentions using Monte Carlo simulations (e.g., "Monte Carlo by generating 1000 privatized mean estimators") but this relates to the evaluation of the proposed mechanism, not dataset partitioning for model learning.
Hardware Specification No The paper includes a table comparing computation times for generating ICLPs and Gaussian Processes (Table 1), but it does not specify the hardware (e.g., CPU, GPU models, memory) on which these computations were performed.
Software Dependencies No The paper mentions specific R packages used, such as 'R package diffpriv', 'R package refund', and 'R package fds'. However, it does not provide specific version numbers for these packages or for the R environment itself, which are necessary for reproducible software dependencies.
Experiment Setup Yes In this section, we conduct the simulation for the mean function privacy protection problem discussed in Section 4.2. We use the isotropic Matérn kernel (Cressie and Huang, 1999) as the covariance kernel for the ICLP noise. It takes the form ... In the following experiments, we set d = 1, ρ = 0.1, the privacy budget ϵ = 1, and α = 1.5 such that λj j−4. ... For both the ICLP-AR and ICLP-QR, we set η and ψPSS to be the values in Theorem 12 such that the privacy error is the same order as the estimation error. For PCV, we obtain ψPCV by 10-fold PCV within the range of [0.1ψPSS, 10ψPSS]. ... We use h ~ n−1/(4+d) where d = 1, 2 to ensure we gain privacy for free and remain privacy safe.