Differentially Private Sum-Product Networks
Authors: Xenia Heilmann, Mattia Cerrato, Ernst Althaus
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach outperforms the state of the art in terms of stability (i.e. number of training runs required for convergence) and utility of the generated data. |
| Researcher Affiliation | Academia | 1Institute of Computer Science, Johannes Gutenberg University Mainz, Saarstraße 21, Mainz 55122, Rhineland Palatinate, Germany. Correspondence to: Xenia Heilmann <xenia.heilmann@uni-mainz.de>. |
| Pseudocode | Yes | Algorithm 1 Learn DPSPN; Algorithm 2 Learn DPSPN class |
| Open Source Code | Yes | Code for all the experiments is available at https://github.com/xheilmann/DPSPN. |
| Open Datasets | Yes | For classification tasks, we provide results for five datasets with continuous, discrete and binary variables: cervical cancer (mis, 2019), german-credit (Hofmann, 1994), diabetes (Semerdjian & Frank, 2017), bank (Moro et al., 2014) and adult (Becker & Kohavi, 1996) (see Appendix B, Table 3 for dataset statistics). |
| Dataset Splits | No | The paper does not explicitly state specific training/validation/test split percentages or methodology for the datasets used in its experiments. It mentions using 'test AUROC' and 'real test data' but no details on how data was partitioned for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components and algorithms (e.g., K-means, Logistic Regression, PyTorch implicitly through references) but does not provide specific version numbers for any of these to ensure reproducibility. |
| Experiment Setup | Yes | For DPSPNs, we performed a grid search over η = {0.1N, 0.2N, . . . , N} (with N defining the size of the dataset) and α = {0.1, 0.3, 0.5, 0.7, 0.9} to give an intuition to which hyperparameters to choose for unknown datasets. [...] We conducted experiments for ε {0.1, 0.5, 1, 5, 10, 100} and kept a fixed δ = 10 6. Additionally, for DPSPNs we set the maximum of privacy-consuming function calls on the data t to 10. |