On the Size and Approximation Error of Distilled Datasets
Authors: Alaa Maalouf, Murad Tukan, Noel Loo, Ramin Hasani, Mathias Lechner, Daniela Rus
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our bounds analytically and empirically. ... 5 Experimental Study: To validate our theoretical bounds, we performed distillation on three datasets: two synthetic datasets ... and one real dataset of MNIST binary and multi-class classification. Full experimental details for all experiments are available in the appendix. |
| Researcher Affiliation | Collaboration | Alaa Maalouf MIT CSAIL Murad Tukan Data Heroes Noel Loo MIT CSAIL Ramin Hasani MIT CSAIL Mathias Lechner MIT CSAIL Daniela Rus MIT CSAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available in the supplementary material. |
| Open Datasets | Yes | For our next test, we first consider binary classification on (i) MNIST 0 and 1 digits, (ii) SVHN 0 and 1 digit, and (iii) CIFAR-10 ship vs deer; all with labels −1 and +1, respectively. |
| Dataset Splits | No | The paper mentions using standard datasets like MNIST, SVHN, and CIFAR-10, which have predefined splits. However, it does not explicitly state the training, validation, or test split percentages or sample counts, nor does it cite the specific predefined splits used for reproducibility, which is required by the prompt criteria. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or cloud instance types. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We distill for 20000 iterations with Adam optimizer with a learning rate of 0.002 optimizing both images/data positions and labels. We use full batch gradient descent for the synthetic datasets and a maximum batch size of 2000 for the MNIST experiment. For the MNIST experiment we found that particularly for larger values of n, with minibatch training, we could obtain lower distillation losses by optimizing for longer, so the closing of the gap between the upper bound and experiment values in fig. 4 may be misleading: longer optimization could bring the actual distillation loss lower. We fix λ = 10−5 and distill down to s = dλ k log dλ k. We use a squared exponential kernel with lengthscale parameter l = 1.5: k(x, x ) = e ||x−x ||2 2 2l2 . We then sample y ∼ N(0, KXX + σ2 y In), σy = 0.01. |