Certified Data Removal from Machine Learning Models
Authors: Chuan Guo, Tom Goldstein, Awni Hannun, Laurens Van Der Maaten
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments We test our certified removal mechanism in three settings: (1) removal from a standard linear logistic regressor, (2) removal from a linear logistic regressor that uses a feature extractor pre-trained on public data, and (3) removal from a non-linear logistic regressor by using a differentially private feature extractor. Code reproducing the results of our experiments is publicly available from https://github.com/facebookresearch/certified-removal. Table 1 summarizes the training and removal times measured in our experiments. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Cornell University, New York, USA 2Facebook AI Research, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Training of a certified removal-enabled model. and Algorithm 2 Repeated certified removal of data batches. |
| Open Source Code | Yes | Code reproducing the results of our experiments is publicly available from https://github.com/facebookresearch/certified-removal. |
| Open Datasets | Yes | We first experiment on the MNIST digit classification dataset. We study two tasks: (1) scene classification on the LSUN dataset and (2) sentiment classification on the Stanford Sentiment Treebank (SST) dataset. We evaluate this approach on the Street View House Numbers (SVHN) digit classification dataset. |
| Dataset Splits | No | The paper discusses datasets and evaluation on "test accuracy" but does not explicitly provide specific training, validation, and test split percentages, sample counts, or references to predefined splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., specific Python library versions, deep learning frameworks with versions). |
| Experiment Setup | Yes | Training a removal-enabled model using Algorithm 1 requires selecting two hyperparameters: the L2-regularization parameter, λ, and the standard deviation, σ, of the sampled perturbation vector b. Figure 1 shows the effect of λ and σ on test accuracy and the expected number of removals supported before re-training. Removal is performed using Algorithm 2 with δ = 1e-4. At removal time, we use Algorithm 2 with ϵ = 1 and δ = 1e-4 in both experiments. |