Provable and Efficient Dataset Distillation for Kernel Ridge Regression

Authors: Yilan Chen, Wei Huang, Lily Weng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g. 15840 faster on CIFAR-100.
Researcher Affiliation Academia Yilan Chen UCSD CSE yic031@ucsd.edu Wei Huang RIEKN AIP wei.huang.vr@riken.jp Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu
Pseudocode Yes Algorithm 1 Dataset distillation for kernel ridge regression
Open Source Code Yes Our code is available at Git Hub.
Open Datasets Yes MNIST [13] 10 784 60000 CIFAR-10 [12] 10 3072 50000 CIFAR-100 [12] 100 3072 50000 Image Net-1k [28] 1000 196608 1281167
Dataset Splits Yes To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set.
Hardware Specification Yes All the experiments are implemented with Py Torch [26] and conducted on a single 24G A5000 GPU.
Software Dependencies No All the experiments are implemented with Py Torch [26] (Specific version of PyTorch or other libraries is not provided).
Experiment Setup Yes For simplicity, we set λS = 0 for all experiments. To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. The mean and standard deviation of test accuracy are computed over four independent runs.