Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Dirichlet Mechanism for Differentially Private KL Divergence Minimization
Authors: Donlapark Ponnoprat
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world datasets demonstrate advantages of our algorithm over Gaussian and Laplace mechanisms in supervised classification and maximum likelihood estimation. We compare the Dirichlet mechanism against the Gaussian and Laplace mechanisms for two learning tasks: naïve Bayes classification and maximum likelihood estimation of Bayesian networks both tasks can be done with KL divergence minimization. Experiments on real-world datasets show that the Dirichlet mechanism provides smaller cross-entropy loss in classification, and larger log-likelihood in parameter estimation, than the other mechanisms at the same level of privacy guarantee. |
| Researcher Affiliation | Academia | Donlapark Ponnoprat EMAIL Department of Statistics Chiang Mai University |
| Pseudocode | Yes | Algorithm 1 (λ, ε)-RDP Dirichlet mechanism |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | In this experiment, the naïve Bayes models with differentially private mechanisms are used to classify 8 UCI datasets (Dua & Graff, 2017) with diverse number of instances/attributes/classes. The details of the datasets are shown in Table 1. |
| Dataset Splits | Yes | For each dataset, we use a 70-30 train-test split. Before fitting the models, numerical attributes are transformed into categorical ones using quantile binning, where the number of bins is fixed at 10. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions that polygamma functions, root-finding methods and Dirichlet distributions are readily available in many scientific programming languages, but it does not specify any particular software or library names with version numbers. |
| Experiment Setup | Yes | For all privacy mechanisms, we fix λ = 5 and study their performances as ε increases from 10^-3 to 10. We also add the random guessing model, which is a (λ, 0)-RDP model, as the baseline. The classification performances, measured in cross-entropy (CE) loss and accuracy on the test sets, are shown in Figure 4 and 5. ... To modify the model with the Dirichlet mechanism, we sample (π1, ..., πd) Dirichlet(r(N1, ..., Nd) + α), where r and α are chosen according to Algorithm 1 (with σ^2 = 2 and Δ = 1) to attain (λ, ε/K + 1)-RDP. |