Coverage-centric Coreset Selection for High Pruning Rates

Authors: Haizhong Zheng, Rui Liu, Fan Lai, Atul Prakash

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CCS on five datasets and show that, at high pruning rates (e.g., 90%), it achieves significantly better accuracy than previous SOTA methods (e.g., at least 19.56% higher on CIFAR10) as well as random selection (e.g., 7.04% higher on CIFAR10) and comparable accuracy at low pruning rates. We make our code publicly available at Git Hub1.
Researcher Affiliation Academia Haizhong Zheng, Rui Liu, Fan Lai, Atul Prakash Computer Science and Engineering University of Michigan Ann Arbor, MI 48109, USA
Pseudocode Yes Algorithm 1: Coverage-centric Coreset Selection (CCS)
Open Source Code Yes We make our code publicly available at Git Hub1.
Open Datasets Yes We evaluate CCS on five datasets (CIFAR10, CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), CINIC10 (Darlow et al., 2018), and Image Net (Deng et al., 2009)).
Dataset Splits Yes The CINIC10 dataset contains 270, 000 images in total and is evenly split to three subsets: training, valid, and test. Guided by (Darlow et al., 2018), we combine the training and valid set to form a large training dataset containing 180, 000 images and measure the test accuracy with the test set (containing 90, 000 examples).
Hardware Specification Yes Each model is trained on a NVIDIA 2080TI GPU.
Software Dependencies No The paper mentions software components and optimizers (e.g., SGD, ResNet architectures, cosine annealing learning rate scheduler) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes B DETAILED EXPERIMENTAL SETTING CIFAR10 and CIFAR100 (Krizhevsky et al., 2009). We use Res Net18 (He et al., 2016) as the network architecture for CIFAR10/CIFAR100. For all coresets with different pruning rates, we train models for 40, 000 iterations with a 256 batch size (about 200 epochs over the entire dataset). We use the SGD optimizer (0.9 momentum and 0.0002 weight decay) with a 0.1 initial learning rate. The learning rate scheduler is the cosine annealing learning rate scheduler (Loshchilov & Hutter, 2017) with a 0.0001 minimum learning rate. We use a 4-pixel padding crop and a randomly horizontal flip as data augmentation. [...] Coverage-centric methods setting. We set the number of strata k = 50 for all datasets and pruning rates. We use the grid search with 0.1 step size to find an optimal hard cutoff rate β for different datasets and pruning rates. For each dataset, we list the optimal β value for every α as follows in the format of tuple (α, β).