Privately detecting changes in unknown distributions
Authors: Rachel Cummings, Sara Krehbiel, Yuliia Lut, Wanrong Zhang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5 we report experimental results that empirically validate our theoretical results. We start by applying our PNCPD algorithm to a real-world dataset of stock price time-series data that appear by visual inspection to contain a change-point, and we find that our algorithm finds the correct change-point with minimal error, even for small ϵ values. We then apply our PNCPD algorithm to simulated datasets sampled from Gaussian distributions, varying the parameters corresponding to the size of the distributional change, the location of the change-point in the dataset, and ϵ. |
| Researcher Affiliation | Academia | 1H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA 2Department of Mathematics and Computer Science, Santa Clara University, Santa Clara, California, USA. |
| Pseudocode | Yes | Algorithm 1 Private Nonparametric Change-Point Detector: PNCPD(X, ϵ, γ) ... Algorithm 2 Online Private Nonparametric Change-Point Detector: ONLINEPNCPD(X, n, ϵ, γ, T) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is openly available. |
| Open Datasets | Yes | We use a dataset from (Cao et al., 2018), which contains stock price data over time, with prices collected every second over a span of 5 hours on October 9, 2012. |
| Dataset Splits | No | The paper describes using synthetic datasets for simulation and repeating processes multiple times (e.g., 'This process is repeated 103 times for each value of k and µ1.'), but it does not specify explicit training, validation, or test dataset splits or a detailed splitting methodology for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers, that were used to replicate the experiment. |
| Experiment Setup | Yes | We use n = 200 observations with true change k = 50, 100, 150. This process is repeated 103 times for each value of k and µ1. We consider the performance of our algorithm for γ = 0.1 and ϵ = 0.1, 1, 5, , where ϵ = corresponds to the non-private problem, which serves as our baseline. ... We use n = 200 observations where the true drift change occurs at time t = 100, and repeat the process 103 times. We modify the observations X to create a new sample Y = {y1, . . . , yn/2}, and apply our PNCPD algorithm to this new sample. Figure 3 plots the empirical accuracy β = Pr[| t k | > α] as a function of α for γ = 0.1 and ϵ = 0.1, 1, 5, , where ϵ = is our non-private baseline. |