Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable and Efficient Dataset Distillation for Kernel Ridge Regression
Authors: Yilan Chen, Wei Huang, Lily Weng
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g. 15840 faster on CIFAR-100. |
| Researcher Affiliation | Academia | Yilan Chen UCSD CSE EMAIL Wei Huang RIEKN AIP EMAIL Tsui-Wei Weng UCSD HDSI EMAIL |
| Pseudocode | Yes | Algorithm 1 Dataset distillation for kernel ridge regression |
| Open Source Code | Yes | Our code is available at Git Hub. |
| Open Datasets | Yes | MNIST [13] 10 784 60000 CIFAR-10 [12] 10 3072 50000 CIFAR-100 [12] 100 3072 50000 Image Net-1k [28] 1000 196608 1281167 |
| Dataset Splits | Yes | To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. |
| Hardware Specification | Yes | All the experiments are implemented with Py Torch [26] and conducted on a single 24G A5000 GPU. |
| Software Dependencies | No | All the experiments are implemented with Py Torch [26] (Specific version of PyTorch or other libraries is not provided). |
| Experiment Setup | Yes | For simplicity, we set λS = 0 for all experiments. To choose the original model s regularization λ, we split the original training set into a training set and a validation set, and choose the λ that performs best on the validation set. The mean and standard deviation of test accuracy are computed over four independent runs. |