reproducibilityindex.ai

Certified Data Removal from Machine Learning Models

Authors: Chuan Guo, Tom Goldstein, Awni Hannun, Laurens Van Der Maaten

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments We test our certiﬁed removal mechanism in three settings: (1) removal from a standard linear logistic regressor, (2) removal from a linear logistic regressor that uses a feature extractor pre-trained on public data, and (3) removal from a non-linear logistic regressor by using a differentially private feature extractor. Code reproducing the results of our experiments is publicly available from https://github.com/facebookresearch/certified-removal. Table 1 summarizes the training and removal times measured in our experiments.
Researcher Affiliation	Collaboration	1Department of Computer Science, Cornell University, New York, USA 2Facebook AI Research, New York, USA.
Pseudocode	Yes	Algorithm 1 Training of a certiﬁed removal-enabled model. and Algorithm 2 Repeated certiﬁed removal of data batches.
Open Source Code	Yes	Code reproducing the results of our experiments is publicly available from https://github.com/facebookresearch/certified-removal.
Open Datasets	Yes	We ﬁrst experiment on the MNIST digit classiﬁcation dataset. We study two tasks: (1) scene classiﬁcation on the LSUN dataset and (2) sentiment classiﬁcation on the Stanford Sentiment Treebank (SST) dataset. We evaluate this approach on the Street View House Numbers (SVHN) digit classiﬁcation dataset.
Dataset Splits	No	The paper discusses datasets and evaluation on "test accuracy" but does not explicitly provide specific training, validation, and test split percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., specific Python library versions, deep learning frameworks with versions).
Experiment Setup	Yes	Training a removal-enabled model using Algorithm 1 requires selecting two hyperparameters: the L2-regularization parameter, λ, and the standard deviation, σ, of the sampled perturbation vector b. Figure 1 shows the effect of λ and σ on test accuracy and the expected number of removals supported before re-training. Removal is performed using Algorithm 2 with δ = 1e-4. At removal time, we use Algorithm 2 with ϵ = 1 and δ = 1e-4 in both experiments.