reproducibilityindex.ai

Data Pruning via Moving-one-Sample-out

Authors: Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, Xiaojuan Qi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our Mo So on CIFAR-100 [5], Tiny-Image Net [49], and Image Net-1K [31] under different pruning ratios. As shown in Figure 1, our Mo So significantly surpasses the previous state-of-the-art methods, especially for high pruning ratios.
Researcher Affiliation	Collaboration	1HKU 2CUHK 3DAMO Academy, Alibaba Group 4Hupan Lab, Zhejiang Province
Pseudocode	Yes	Algorithm 1: Data Pruning with Mo So.
Open Source Code	Yes	Official Implementation
Open Datasets	Yes	We evaluate our method on three well-known public benchmarks: the CIFAR-100 [5], which contains 50,000 training examples of 100 categories; the Tiny-Image Net [49], which has 100,000 images of 200 classes; and the Image Net-1K [31], which covers 1000 classes with more than 1M training images.
Dataset Splits	No	The paper uses standard datasets like CIFAR-100, Tiny-Image Net, and Image Net-1K but does not explicitly detail the training/validation/test splits (e.g., percentages or specific sample counts) used in their experiments.
Hardware Specification	Yes	All the experiments are run on a server with 8 Tesla-V100 GPUs.
Software Dependencies	No	We implement our method in Pytorch [2]. However, no specific version number for PyTorch or any other software dependency is provided.
Experiment Setup	Yes	We use the same network structure Res Net-50 [22] for both the coreset and the surrogate network on the full data. We train the surrogate network on all datasets for 50 epochs. To estimate the mathematical expectation in Eq.(4) from Proposition 1, we randomly sample 10 time steps.