What Can We Learn from Unlearnable Datasets?

Authors: Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we analyze properties of unlearnable dataset methods in order to assess their future viability and security promises. We make several findings by analyzing a number of unlearnable datasets developed from diverse objectives and theory. We demonstrate that, in many cases, neural networks can learn generalizable features from unlearnable datasets. We challenge the common belief that unlearnable datasets work due to linearly separable perturbations. We construct a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. Our proposed attack is significantly less complex than recently proposed techniques.
Researcher Affiliation Collaboration Pedro Sandoval-Segura1 Vasu Singla1 Jonas Geiping1 Micah Goldblum2 Tom Goldstein1 1University of Maryland 2New York University {psando, vsingla, jgeiping, tomg}@umd.edu goldblum@nyu.edu
Pseudocode Yes Algorithm 1 Orthogonal Projection Input: Unlearnable Dataset, (X, Y ) e Dtrain Output: Recovered Dataset, (Xr, Y ) 1: while not converged do 2: Sample batch (x, y) from e Dtrain 3: W W η W L W T x, y 4: end while 5: Perform QR decomposition on W to obtain Q matrix 6: Xr X QQT X
Open Source Code Yes 1Code is available at https://github.com/psandovalsegura/learn-from-unlearnable.
Open Datasets Yes For all of our experiments, we make use of open-source unlearnable datasets: From Unlearnable Examples [10]... Results in this section are for CIFAR-10 [13], and additional results for SVHN [18], CIFAR-100 [13], and Image Net [25] Subset are in Appendix A.2.2 and A.4.2.
Dataset Splits Yes Next, for each checkpoint, we utilize a random subset of 5,000 clean CIFAR-10 training samples (10% of the original training set) to train a new classification head (and keep the feature extractor weights fixed).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software components like ResNet-18, VGG-16, GoogleNet, ViT, SGD, L-BFGS, and scikit-learn, but does not provide specific version numbers for any of them.
Experiment Setup Yes Training hyperparameters can be found in Appendix A.1. In Section 4.2, we train a number of Res Net-18 (RN-18) [8] models on different unlearnable datasets with cross-entropy loss for 60 epochs using a batch size of 128. We save checkpoints at every epoch of training. For our optimizer, we use SGD with momentum of 0.9 and weight decay of 5 10 4. We use an initial learning rate of 0.1 which decays using a cosine annealing schedule.