Dataset Distillation via Factorization
Authors: Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65%. Moreover, distilled datasets by our approach also achieve ~10% higher accuracy than baseline methods in cross-architecture generalization. |
| Researcher Affiliation | Academia | National University of Singapore {songhua.liu,e0823044,xyang}@u.nus.edu, {jingweny,xinchao}@nus.edu.sg |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available here. |
| Open Datasets | Yes | We conduct evaluations of our method on three standard image classification benchmarks: SVHN [28], CIFAR10, and CIFAR100 [22]. |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly provide details about a validation set split. |
| Hardware Specification | Yes | The maximal configuration of computational resources is 4 24GB 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "Kornia implementation [35]" for ZCA but does not specify its version or other software dependencies with version numbers. |
| Experiment Setup | Yes | For hallucinators, the encoder and decoder contain 1 Conv-Re LU blocks. The number of feature channel c is 3. We use 5 hallucinators by default. The learning rates of hallucinators and bases, ηH and ηB, are the same and for the feature extractor, the learning rate ηF is 0.001. Hyper-parameters λcon., λtask, λDD, and λcos. are set as 0.1, 1, 1, and 0.1 empirically. Sensitivities of these hyper-parameters are analyzed in Sec. 4.3. The adversary network has the same architecture as that for computing LDD. In experiments on SVHN and CIFAR10, we incorporate all the bases in each iteration, while in experiments on CIFAR100, we adopt a batch size of 300 when the total number of bases is greater than 1,000. We only consider random 2 hallucinators in one iteration for simplicity. |