Provable and Efficient Dataset Distillation for Kernel Ridge Regression
Authors: Yilan Chen, Wei Huang, Lily Weng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g. 15840 faster on CIFAR-100. |
| Researcher Affiliation | Academia | Yilan Chen UCSD CSE yic031@ucsd.edu Wei Huang RIEKN AIP wei.huang.vr@riken.jp Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu |
| Pseudocode | Yes | Algorithm 1 Dataset distillation for kernel ridge regression |
| Open Source Code | Yes | Our code is available at Git Hub. |
| Open Datasets | Yes | MNIST [13] 10 784 60000 CIFAR-10 [12] 10 3072 50000 CIFAR-100 [12] 100 3072 50000 Image Net-1k [28] 1000 196608 1281167 |
| Dataset Splits | Yes | To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. |
| Hardware Specification | Yes | All the experiments are implemented with Py Torch [26] and conducted on a single 24G A5000 GPU. |
| Software Dependencies | No | All the experiments are implemented with Py Torch [26] (Specific version of PyTorch or other libraries is not provided). |
| Experiment Setup | Yes | For simplicity, we set λS = 0 for all experiments. To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. The mean and standard deviation of test accuracy are computed over four independent runs. |