reproducibilityindex.ai

Provable and Efficient Dataset Distillation for Kernel Ridge Regression

Authors: Yilan Chen, Wei Huang, Lily Weng

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g. 15840 faster on CIFAR-100.
Researcher Affiliation	Academia	Yilan Chen UCSD CSE yic031@ucsd.edu Wei Huang RIEKN AIP wei.huang.vr@riken.jp Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu
Pseudocode	Yes	Algorithm 1 Dataset distillation for kernel ridge regression
Open Source Code	Yes	Our code is available at Git Hub.
Open Datasets	Yes	MNIST [13] 10 784 60000 CIFAR-10 [12] 10 3072 50000 CIFAR-100 [12] 100 3072 50000 Image Net-1k [28] 1000 196608 1281167
Dataset Splits	Yes	To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set.
Hardware Specification	Yes	All the experiments are implemented with Py Torch [26] and conducted on a single 24G A5000 GPU.
Software Dependencies	No	All the experiments are implemented with Py Torch [26] (Specific version of PyTorch or other libraries is not provided).
Experiment Setup	Yes	For simplicity, we set λS = 0 for all experiments. To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. The mean and standard deviation of test accuracy are computed over four independent runs.