reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially Private Neural Tangent Kernels (DP-NTK) for Privacy-Preserving Data Generation

Authors: Yilin Yang, Kamil Adamczewski, Xiaoxiao Li, Danica J. Sutherland, Mijung Park

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our method, DP-NTK, on popular benchmark image datasets such as MNIST, Fashion MNIST, Celeb A, and CIFAR10, as well as on 8 benchmark tabular datasets. For MNIST/FMNIST we use a Conditional CNN as the generator. We use Res Net18 as our generator for Cifar10 and Celeb A, and a fully connected network for tabular data: for homogenous data, we use 2 hidden layers + Re LU+ batch norm, or 3 hidden layers + Re LU + batch norm plus an additional sigmoid layer for the categorical features for heterogeneous data.
Researcher Affiliation	Academia	Yilin Yang EMAIL Department of Computer Science, University of British Columbia, Vancouver, Canada. Kamil Adamczewski EMAIL ETH Zurich, Switzerland. Xiaoxiao Li EMAIL Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada. Vector Institute, Toronto, Canada. Danica J. Sutherland EMAIL Department of Computer Science, University of British Columbia, Vancouver, Canada. Alberta Machine Intelligence Institute, Edmonton, Canada. Mijung Park EMAIL Department of Computer Science, University of British Columbia, Vancouver, Canada. Alberta Machine Intelligence Institute, Edmonton, Canada. Department of Applied Mathematics and Computer Science, Technical University of Denmark, Denmark.
Pseudocode	Yes	Algorithm 1 DP-NTK
Open Source Code	Yes	Our code is at https://github.com/Freddie Never Left/DP-NTK. The readme file of this repository and the section Appendix A contain hyper-parameter settings (e.g., the architectural choices used for the model whose e-NTK we take) for reproducibility.
Open Datasets	Yes	We test our method, DP-NTK, on popular benchmark image datasets such as MNIST, Fashion MNIST, Celeb A, and CIFAR10, as well as on 8 benchmark tabular datasets.
Dataset Splits	No	The paper mentions the use of datasets like MNIST, Fashion MNIST, Celeb A, CIFAR10, and 8 tabular datasets, but it does not explicitly provide information about how these datasets were split into training, validation, or test sets, nor does it refer to specific predefined splits with citations. The hyperparameter table in Appendix A lists batch sizes and iterations but not dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions "Py Torch (Paszke et al., 2019)" and "the auto-dp package of Wang et al. (2019)" but does not specify version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Appendix A. Hyperparameters Used in Experiments dataset iter d code NTK width batch lr eps architecture dmnist 2000 5 800 5000 0.01 10, 1, 0.2 fc 1l fmnist 2000 5 800 5000 0.01 10, 1, 0.2 fc 1l celeba 20000 141 3000 200 1000 0.01 10 fc 2l cifar10 40000 201 3000 200 1000 0.01 10 1 fc 2l cifar10 20000 31 800 1000 200 0.01 None fc 2l adult 50 11 30 200 200 0.01 1 cnn 2l census 2000 21 30 20 200 0.01 1 cnn 2l cervical 500 11 800 1000 200 0.01 1 cnn 2l credit 500 11 1500 200 0.01 1 fc 1l epileptic 2000 101 50 20 200 0.01 1 cnn 2l isolet 1000 21 10 20 200 0.01 1 cnn 2l covtype 1000 101 100 20 200 0.01 1 cnn 2l intrusion 1000 21 30 1000 200 0.01 1 fc 2l