reproducibilityindex.ai

Differentially Private Sum-Product Networks

Authors: Xenia Heilmann, Mattia Cerrato, Ernst Althaus

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our approach outperforms the state of the art in terms of stability (i.e. number of training runs required for convergence) and utility of the generated data.
Researcher Affiliation	Academia	1Institute of Computer Science, Johannes Gutenberg University Mainz, Saarstraße 21, Mainz 55122, Rhineland Palatinate, Germany. Correspondence to: Xenia Heilmann <xenia.heilmann@uni-mainz.de>.
Pseudocode	Yes	Algorithm 1 Learn DPSPN; Algorithm 2 Learn DPSPN class
Open Source Code	Yes	Code for all the experiments is available at https://github.com/xheilmann/DPSPN.
Open Datasets	Yes	For classification tasks, we provide results for five datasets with continuous, discrete and binary variables: cervical cancer (mis, 2019), german-credit (Hofmann, 1994), diabetes (Semerdjian & Frank, 2017), bank (Moro et al., 2014) and adult (Becker & Kohavi, 1996) (see Appendix B, Table 3 for dataset statistics).
Dataset Splits	No	The paper does not explicitly state specific training/validation/test split percentages or methodology for the datasets used in its experiments. It mentions using 'test AUROC' and 'real test data' but no details on how data was partitioned for training, validation, and testing.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions software components and algorithms (e.g., K-means, Logistic Regression, PyTorch implicitly through references) but does not provide specific version numbers for any of these to ensure reproducibility.
Experiment Setup	Yes	For DPSPNs, we performed a grid search over η = {0.1N, 0.2N, . . . , N} (with N defining the size of the dataset) and α = {0.1, 0.3, 0.5, 0.7, 0.9} to give an intuition to which hyperparameters to choose for unknown datasets. [...] We conducted experiments for ε {0.1, 0.5, 1, 5, 10, 100} and kept a fixed δ = 10 6. Additionally, for DPSPNs we set the maximum of privacy-consuming function calls on the data t to 10.