Data Pruning via Moving-one-Sample-out
Authors: Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, Xiaojuan Qi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our Mo So on CIFAR-100 [5], Tiny-Image Net [49], and Image Net-1K [31] under different pruning ratios. As shown in Figure 1, our Mo So significantly surpasses the previous state-of-the-art methods, especially for high pruning ratios. |
| Researcher Affiliation | Collaboration | 1HKU 2CUHK 3DAMO Academy, Alibaba Group 4Hupan Lab, Zhejiang Province |
| Pseudocode | Yes | Algorithm 1: Data Pruning with Mo So. |
| Open Source Code | Yes | Official Implementation |
| Open Datasets | Yes | We evaluate our method on three well-known public benchmarks: the CIFAR-100 [5], which contains 50,000 training examples of 100 categories; the Tiny-Image Net [49], which has 100,000 images of 200 classes; and the Image Net-1K [31], which covers 1000 classes with more than 1M training images. |
| Dataset Splits | No | The paper uses standard datasets like CIFAR-100, Tiny-Image Net, and Image Net-1K but does not explicitly detail the training/validation/test splits (e.g., percentages or specific sample counts) used in their experiments. |
| Hardware Specification | Yes | All the experiments are run on a server with 8 Tesla-V100 GPUs. |
| Software Dependencies | No | We implement our method in Pytorch [2]. However, no specific version number for PyTorch or any other software dependency is provided. |
| Experiment Setup | Yes | We use the same network structure Res Net-50 [22] for both the coreset and the surrogate network on the full data. We train the surrogate network on all datasets for 50 epochs. To estimate the mathematical expectation in Eq.(4) from Proposition 1, we randomly sample 10 time steps. |