Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable and Efficient Dataset Distillation for Kernel Ridge Regression

Authors: Yilan Chen, Wei Huang, Lily Weng

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g. 15840 faster on CIFAR-100.
Researcher Affiliation Academia Yilan Chen UCSD CSE EMAIL Wei Huang RIEKN AIP EMAIL Tsui-Wei Weng UCSD HDSI EMAIL
Pseudocode Yes Algorithm 1 Dataset distillation for kernel ridge regression
Open Source Code Yes Our code is available at Git Hub.
Open Datasets Yes MNIST [13] 10 784 60000 CIFAR-10 [12] 10 3072 50000 CIFAR-100 [12] 100 3072 50000 Image Net-1k [28] 1000 196608 1281167
Dataset Splits Yes To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set.
Hardware Specification Yes All the experiments are implemented with Py Torch [26] and conducted on a single 24G A5000 GPU.
Software Dependencies No All the experiments are implemented with Py Torch [26] (Specific version of PyTorch or other libraries is not provided).
Experiment Setup Yes For simplicity, we set λS = 0 for all experiments. To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. The mean and standard deviation of test accuracy are computed over four independent runs.