Privately detecting changes in unknown distributions

Authors: Rachel Cummings, Sara Krehbiel, Yuliia Lut, Wanrong Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5 we report experimental results that empirically validate our theoretical results. We start by applying our PNCPD algorithm to a real-world dataset of stock price time-series data that appear by visual inspection to contain a change-point, and we find that our algorithm finds the correct change-point with minimal error, even for small ϵ values. We then apply our PNCPD algorithm to simulated datasets sampled from Gaussian distributions, varying the parameters corresponding to the size of the distributional change, the location of the change-point in the dataset, and ϵ.
Researcher Affiliation Academia 1H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA 2Department of Mathematics and Computer Science, Santa Clara University, Santa Clara, California, USA.
Pseudocode Yes Algorithm 1 Private Nonparametric Change-Point Detector: PNCPD(X, ϵ, γ) ... Algorithm 2 Online Private Nonparametric Change-Point Detector: ONLINEPNCPD(X, n, ϵ, γ, T)
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is openly available.
Open Datasets Yes We use a dataset from (Cao et al., 2018), which contains stock price data over time, with prices collected every second over a span of 5 hours on October 9, 2012.
Dataset Splits No The paper describes using synthetic datasets for simulation and repeating processes multiple times (e.g., 'This process is repeated 103 times for each value of k and µ1.'), but it does not specify explicit training, validation, or test dataset splits or a detailed splitting methodology for reproduction.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as library names with version numbers, that were used to replicate the experiment.
Experiment Setup Yes We use n = 200 observations with true change k = 50, 100, 150. This process is repeated 103 times for each value of k and µ1. We consider the performance of our algorithm for γ = 0.1 and ϵ = 0.1, 1, 5, , where ϵ = corresponds to the non-private problem, which serves as our baseline. ... We use n = 200 observations where the true drift change occurs at time t = 100, and repeat the process 103 times. We modify the observations X to create a new sample Y = {y1, . . . , yn/2}, and apply our PNCPD algorithm to this new sample. Figure 3 plots the empirical accuracy β = Pr[| t k | > α] as a function of α for γ = 0.1 and ϵ = 0.1, 1, 5, , where ϵ = is our non-private baseline.