reproducibilityindex.ai

Bounding the Invertibility of Privacy-preserving Instance Encoding using Fisher Information

Authors: Kiwan Maeng, Chuan Guo, Sanjay Kariyappa, G. Edward Suh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the lower bound using different attacks and encoders, and show that d FIL can be used to interpret the invertibility of instance encoding both theoretically and empirically (Section 3.3). We show how d FIL can be used as a practical privacy metric and guide the design of privacy-enhancing training/inference systems with instance encoding (Section 4 5).
Researcher Affiliation	Collaboration	Kiwan Maeng Penn State University kvm6242@psu.edu Chuan Guo FAIR, Meta chuanguo@meta.com Sanjay Kariyappa Georgia Institute of Technology sanjaykariyappa@gatech.edu G. Edward Suh FAIR, Meta & Cornell University edsuh@cornell.com
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide a direct link or explicit statement about releasing the source code for the methodology described in this paper. It only references external codebases (e.g., 'https://github.com/huyvnphan/PyTorch_CIFAR10' and 'https://huggingface.co/google/ddpm-cifar10-32') used for specific components or baselines.
Open Datasets	Yes	We also evaluated Corollary 1 on MNIST [10] and CIFAR-10 [42]... Models and datasets We used three different models and four different datasets to cover a wide range of applications: Res Net-18 [25] with CIFAR-10/100 [42] for image classification, MLP-based neural collaborative filtering (NCF-MLP) [26] with Movie Lens-20M [24] for recommendation, and Distil Bert [56] with GLUE-SST2 [68] for sentiment analysis... First, we train the model on one of three different datasets: (1) Tiny Image Net [58], (2) CIFAR-100 [42], and (3) held-out 20% of CIFAR-10.
Dataset Splits	Yes	First, we train the model on one of three different datasets: (1) Tiny Image Net [58], (2) CIFAR-100 [42], and (3) held-out 20% of CIFAR-10. Then, layers up to block 4 are frozen and used as the encoder. The CIFAR-10 training set is encoded using the encoder and used to finetune the rest of the model.
Hardware Specification	Yes	All the evaluation was done on a single A100 GPU.
Software Dependencies	No	The paper mentions the use of PyTorch models and DistilBert, but it does not specify exact version numbers for these or any other software dependencies needed for replication (e.g., 'PyTorch 1.9' or 'Python 3.8').
Experiment Setup	Yes	For Res Net-18, we used an implementation tuned for CIFAR-10 dataset from [54], with Re LU replaced with GELU and max pooling replaced with average pooling. We used the default hyperparameters from the repository except for the following: bs=128, lr=0.1, and weight_decay=5 10 4. For NCF-MLP, we used an embedding dimension of 32 and MLP layers of output size [64, 32, 16, 1]. We trained NCF-MLP with Nesterov SGD with momentum=0.9, lr=0.1, and batch size of 128 for a single epoch... For Distil Bert, we used Adam optimizer with a batch size of 16, lr=2 10 5, β1=0.9, β2=0.999, and ϵ = 10 8. We swept the compression layer channel dimension among 2, 4, 8, 16, and the SNR regularizer λ between 10 3 and 100. ...Then, we freeze the layers up to block 4 and trained the rest for 10 epochs with CIFAR-10, with lr=10 3 and keeping other hyperparameters the same.