Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie H. Morgenstern, Aaron Roth, Steven Z. Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess our attack across diverse datasets, including tabular and image data, and for both classification and regression tasks. Initially, we train a model on the complete dataset Xpriv, ypriv to derive parameters β+, and then retrain it excluding a single sample (x, y) to obtain β .
Researcher Affiliation Collaboration Martin Bertran Amazon AWS AI/ML Shuai Tang Jump Trading Michael Kearns University of Pennsylvania Amazon AWS AI/ML Jamie Morgenstern University of Washington Amazon AWS AI/ML Aaron Roth University of Pennsylvania Amazon AWS AI/ML Zhiwei Steven Wu Carnegie Mellon University Amazon AWS AI/ML
Pseudocode Yes Algorithm 1 Generalized Attack Require: Public data Xpub Rm d, ypub Rm Require: Parameter vectors β+, β Rd Require: Loss function ℓ(β) Require: Embedding function ϕ Ensure: Reconstructed sample x Estimate the Hessian ˆH using Eq. (13) Reconstruct the embedding z using Eq. (12) if ϕ(x) = x then Directly recover x = z else Reconstruct the input x using Eq. (6) end if Return x
Open Source Code No Open source code will be provided at a later date
Open Datasets Yes We assess our attack across diverse datasets... on Fashion MNIST (FMNIST), MNIST, and CIFAR10 datasets... (Xiao et al., 2017; Le Cun et al., 1998; Krizhevsky et al., 2009).
Dataset Splits No On each task, we split the dataset into three splits, including 40% for training the target model, another 40% as the public samples for learning shadow models, and the rest 20% as the holdout set for evaluation.
Hardware Specification No The computational costs of the experiments were small enough that they could be run serially on a single GPU machine without great effort.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Each dataset undergoes a normalization process where input features are scaled to the range [ 1, 1].