Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dirichlet Mechanism for Differentially Private KL Divergence Minimization

Authors: Donlapark Ponnoprat

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real-world datasets demonstrate advantages of our algorithm over Gaussian and Laplace mechanisms in supervised classification and maximum likelihood estimation. We compare the Dirichlet mechanism against the Gaussian and Laplace mechanisms for two learning tasks: naïve Bayes classification and maximum likelihood estimation of Bayesian networks both tasks can be done with KL divergence minimization. Experiments on real-world datasets show that the Dirichlet mechanism provides smaller cross-entropy loss in classification, and larger log-likelihood in parameter estimation, than the other mechanisms at the same level of privacy guarantee.
Researcher Affiliation	Academia	Donlapark Ponnoprat EMAIL Department of Statistics Chiang Mai University
Pseudocode	Yes	Algorithm 1 (λ, ε)-RDP Dirichlet mechanism
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	In this experiment, the naïve Bayes models with differentially private mechanisms are used to classify 8 UCI datasets (Dua & Graff, 2017) with diverse number of instances/attributes/classes. The details of the datasets are shown in Table 1.
Dataset Splits	Yes	For each dataset, we use a 70-30 train-test split. Before fitting the models, numerical attributes are transformed into categorical ones using quantile binning, where the number of bins is fixed at 10.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions that polygamma functions, root-finding methods and Dirichlet distributions are readily available in many scientific programming languages, but it does not specify any particular software or library names with version numbers.
Experiment Setup	Yes	For all privacy mechanisms, we fix λ = 5 and study their performances as ε increases from 10^-3 to 10. We also add the random guessing model, which is a (λ, 0)-RDP model, as the baseline. The classification performances, measured in cross-entropy (CE) loss and accuracy on the test sets, are shown in Figure 4 and 5. ... To modify the model with the Dirichlet mechanism, we sample (π1, ..., πd) Dirichlet(r(N1, ..., Nd) + α), where r and α are chosen according to Algorithm 1 (with σ^2 = 2 and Δ = 1) to attain (λ, ε/K + 1)-RDP.