Dataset Pruning: Reducing Training Data by Examining Generalization Influence
Authors: Shuo Yang, Zeke Xie, Hanyu Peng, Min Xu, Mingming Sun, Ping Li
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirically observed generalization gap of dataset pruning is substantially consistent with our theoretical expectations. Furthermore, the proposed method prunes 40% training examples on the CIFAR-10 dataset, halves the convergence time with only 1.3% test accuracy decrease, which is superior to previous score-based sample selection methods. |
| Researcher Affiliation | Collaboration | 1School of Electrical and Data Engineering, University of Technology Sydney 2Cognitive Computing Lab, Baidu Research |
| Pseudocode | Yes | Algorithm 1 Generalization guaranteed dataset pruning. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the proposed methodology or a link to a code repository. |
| Open Datasets | Yes | We evaluate dataset pruning methods on CIFAR10, CIFAR100 Krizhevsky (2009), and Tiny Image Net Le & Yang (2015) datasets. |
| Dataset Splits | No | The paper mentions using a 'validation' set in Table 1 ("validation accuracies"), but it does not provide explicit details about the split percentages, sample counts, or the methodology for creating the training, validation, and test splits needed for reproduction. |
| Hardware Specification | Yes | Time cost (min) 113 113 113 113 3029 ... time of training 720 architectures on a Tesla V100 GPU |
| Software Dependencies | No | The paper mentions optimizers (SGD), but does not provide specific version numbers for software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or GPU acceleration libraries (e.g., CUDA). |
| Experiment Setup | Yes | Specifically, in all experiments, we train the model for 200 epochs with a batch size of 128, a learning rate of 0.01 with cosine annealing learning rate decay strategy, SGD optimizer with the momentum of 0.9 and weight decay of 5e-4, data augmentation of random crop and random horizontal flip. |