Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Single and Multiple Change-Point Detection with Differential Privacy

Authors: Wanrong Zhang, Sara Krehbiel, Rui Tuo, Yajun Mei, Rachel Cummings

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we run several Monte Carlo experiments to validate our theoretical results for both the online and offline settings. We consider data drawn from Bernoulli distributions, which satisfies our first distributional assumption, as well as Gaussian and Gamma distributions, which satisfy our second distributional assumptions. Our offline experiments are summarized in Figure 1, which shows that change-point detection is easier when P0 and P1 are further apart and harder when the privacy requirement is stronger (ϵ is smaller). Additionally, these experiments enhance our theoretical results, finding that Offline PCPD performs well even when we relax the assumptions required for our theoretical accuracy bounds by running our algorithm on imperfect hypotheses P0 and P1 that are closer together than the true distributions from which data are drawn. Figure 3 shows that Online PCPD also performs well, consistent with our theoretical guarantees. (...) In this section, we present results from Monte Carlo experiments designed to validate the theoretical results of previous sections. Our simulations consider both offline (Section 5.1) and online settings (Section 5.2) for the canonical problems of detecting a change in the mean of Bernoulli or Gaussian distributions.
Researcher Affiliation Academia Wanrong Zhang EMAIL H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Sara Krehbiel EMAIL Department of Math and Computer Science Santa Clara University Santa Clara, CA 95053, USA Rui Tuo EMAIL Wm Michael Barnes 64 Department of Industrial and Systems Engineering Texas A&M University College Station, TX 77843, USA Yajun Mei EMAIL H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Rachel Cummings EMAIL H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA
Pseudocode Yes Algorithm 1 Report Noisy Max: Report Max(X, , {f1, . . . , fm}, ϵ) (...) Algorithm 2 Above Noisy Threshold: Above Thresh(X, , {f1, f2, . . .}, T, ϵ) (...) Algorithm 3 Offline private change-point detector: Offline PCPD(X, P0, P1, ϵ, n) (...) Algorithm 4 Offline private change-point detector: Offline PTCPD(X, P0, P1, ϵ, n, A) (...) Algorithm 5 Online private change-point detector: Online PCPD(X, P0, P1, ϵ, n, T) (...) Algorithm 6 Online private multiple change-point detector: Online PMCPD(X, P0, . . . , Pm, ϵ, n, T1, . . . , Tm)
Open Source Code No The paper does not contain any explicit statement about providing source code or a link to a code repository. The license mentioned (CC-BY 4.0) pertains to the paper itself, not an accompanying software release.
Open Datasets No Our simulations consider both offline (Section 5.1) and online settings (Section 5.2) for the canonical problems of detecting a change in the mean of Bernoulli or Gaussian distributions. In the offline setting, we additionally show that our algorithms can accurately detect changes in the variance of Gaussian distribution and detect changes in the shape parameter of a Gamma distribution. (...) Each simulation is characterized by a probability distribution family (Bernoulli, Gaussian, or Gamma), a distribution parameter that changes (mean, standard deviation, or shape), and a change magnitude (large, small, or underspecified).
Dataset Splits No The paper describes Monte Carlo simulations where data is generated based on specified distributions and parameters (e.g., "n = 200 observations where the true change occurs at time k = 100"). It does not use pre-existing datasets with defined training, validation, or test splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for running the experiments or simulations.
Software Dependencies No The paper does not provide specific details or version numbers for any software libraries, programming languages, or tools used in the implementation or experimentation.
Experiment Setup Yes We vary privacy parameter ϵ = 0.1, 0.5, 1 and āˆž, representing the non-private case. For each of our simulations, we use n = 200 observations where the true change occurs at time k = 100. This process is repeated 104 times. (...) For the Gaussian distributions, we truncate the log-likelihoods in the main algorithm and call Offline PTCPD with A = 0.1. (...) The new challenge is to choose an appropriate sliding window size n and corresponding threshold T in order to achieve good overall accuracy. The window size of n = 200 used in the offline simulations does not permit any threshold that reasonably controls both false positive and false negative rates, so we choose a larger window size of n = 700 and restrict our online simulations to ϵ = 0.5, 1, āˆž. (...) For the Bernoulli model, this resulted in a choice of T = 220 for all values of ϵ = 0.5, 1, āˆž. In the Gaussian model, we chose T = 8, 4.5, 100 for ϵ = 0.5, 1, āˆž, respectively.