Learning to Generate Inversion-Resistant Model Explanations

Authors: Hoyong Jeong, Suyoung Lee, Sung Ju Hwang, Sooel Son

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate GNIME using four datasets: (1) Celeb A [17], (2) MNIST [14], (3) CIFAR10 [13], and (4) Image Net [6] , each of which is freely available for research purposes. We demonstrate that GNIME significantly decreases the information leakage in model explanations, decreasing transferable classification accuracy in facial recognition models by up to 84.8% while preserving the original functionality of model explanations.
Researcher Affiliation Academia Hoyong Jeong, Suyoung Lee, Sung Ju Hwang, Sooel Son KAIST {yongari38, suyoung.lee, sjhwang82, sl.son}@kaist.ac.kr
Pseudocode Yes Algorithm 1 Training algorithm in Phase I
Open Source Code Yes To facilitate further research, we publish GNIME at https://github.com/WSP-LAB/GNIME.
Open Datasets Yes We evaluate GNIME using four datasets: (1) Celeb A [17], (2) MNIST [14], (3) CIFAR10 [13], and (4) Image Net [6] , each of which is freely available for research purposes.
Dataset Splits No Specifically, we split this attack dataset with an 80/20 ratio for the train/test split. (No explicit mention of a validation split for model tuning).
Hardware Specification Yes All experiments took place on a system equipped with 512GBs of RAM, two Intel Xeon Gold 6258R CPUs, and four RTX 3090 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes In LNG, we set λ = 500 for Celeb A models and λ = 100 for MNIST, CIFAR-10, and Image Net-100 models, then deploy the final model after 500 epochs.