Differentially Private Sum-Product Networks

Authors: Xenia Heilmann, Mattia Cerrato, Ernst Althaus

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach outperforms the state of the art in terms of stability (i.e. number of training runs required for convergence) and utility of the generated data.
Researcher Affiliation Academia 1Institute of Computer Science, Johannes Gutenberg University Mainz, Saarstraße 21, Mainz 55122, Rhineland Palatinate, Germany. Correspondence to: Xenia Heilmann <xenia.heilmann@uni-mainz.de>.
Pseudocode Yes Algorithm 1 Learn DPSPN; Algorithm 2 Learn DPSPN class
Open Source Code Yes Code for all the experiments is available at https://github.com/xheilmann/DPSPN.
Open Datasets Yes For classification tasks, we provide results for five datasets with continuous, discrete and binary variables: cervical cancer (mis, 2019), german-credit (Hofmann, 1994), diabetes (Semerdjian & Frank, 2017), bank (Moro et al., 2014) and adult (Becker & Kohavi, 1996) (see Appendix B, Table 3 for dataset statistics).
Dataset Splits No The paper does not explicitly state specific training/validation/test split percentages or methodology for the datasets used in its experiments. It mentions using 'test AUROC' and 'real test data' but no details on how data was partitioned for training, validation, and testing.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions software components and algorithms (e.g., K-means, Logistic Regression, PyTorch implicitly through references) but does not provide specific version numbers for any of these to ensure reproducibility.
Experiment Setup Yes For DPSPNs, we performed a grid search over η = {0.1N, 0.2N, . . . , N} (with N defining the size of the dataset) and α = {0.1, 0.3, 0.5, 0.7, 0.9} to give an intuition to which hyperparameters to choose for unknown datasets. [...] We conducted experiments for ε {0.1, 0.5, 1, 5, 10, 100} and kept a fixed δ = 10 6. Additionally, for DPSPNs we set the maximum of privacy-consuming function calls on the data t to 10.