Certified Data Removal from Machine Learning Models

Authors: Chuan Guo, Tom Goldstein, Awni Hannun, Laurens Van Der Maaten

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments We test our certified removal mechanism in three settings: (1) removal from a standard linear logistic regressor, (2) removal from a linear logistic regressor that uses a feature extractor pre-trained on public data, and (3) removal from a non-linear logistic regressor by using a differentially private feature extractor. Code reproducing the results of our experiments is publicly available from https://github.com/facebookresearch/certified-removal. Table 1 summarizes the training and removal times measured in our experiments.
Researcher Affiliation Collaboration 1Department of Computer Science, Cornell University, New York, USA 2Facebook AI Research, New York, USA.
Pseudocode Yes Algorithm 1 Training of a certified removal-enabled model. and Algorithm 2 Repeated certified removal of data batches.
Open Source Code Yes Code reproducing the results of our experiments is publicly available from https://github.com/facebookresearch/certified-removal.
Open Datasets Yes We first experiment on the MNIST digit classification dataset. We study two tasks: (1) scene classification on the LSUN dataset and (2) sentiment classification on the Stanford Sentiment Treebank (SST) dataset. We evaluate this approach on the Street View House Numbers (SVHN) digit classification dataset.
Dataset Splits No The paper discusses datasets and evaluation on "test accuracy" but does not explicitly provide specific training, validation, and test split percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., specific Python library versions, deep learning frameworks with versions).
Experiment Setup Yes Training a removal-enabled model using Algorithm 1 requires selecting two hyperparameters: the L2-regularization parameter, λ, and the standard deviation, σ, of the sampled perturbation vector b. Figure 1 shows the effect of λ and σ on test accuracy and the expected number of removals supported before re-training. Removal is performed using Algorithm 2 with δ = 1e-4. At removal time, we use Algorithm 2 with ϵ = 1 and δ = 1e-4 in both experiments.