Coverage-centric Coreset Selection for High Pruning Rates
Authors: Haizhong Zheng, Rui Liu, Fan Lai, Atul Prakash
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CCS on five datasets and show that, at high pruning rates (e.g., 90%), it achieves significantly better accuracy than previous SOTA methods (e.g., at least 19.56% higher on CIFAR10) as well as random selection (e.g., 7.04% higher on CIFAR10) and comparable accuracy at low pruning rates. We make our code publicly available at Git Hub1. |
| Researcher Affiliation | Academia | Haizhong Zheng, Rui Liu, Fan Lai, Atul Prakash Computer Science and Engineering University of Michigan Ann Arbor, MI 48109, USA |
| Pseudocode | Yes | Algorithm 1: Coverage-centric Coreset Selection (CCS) |
| Open Source Code | Yes | We make our code publicly available at Git Hub1. |
| Open Datasets | Yes | We evaluate CCS on five datasets (CIFAR10, CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), CINIC10 (Darlow et al., 2018), and Image Net (Deng et al., 2009)). |
| Dataset Splits | Yes | The CINIC10 dataset contains 270, 000 images in total and is evenly split to three subsets: training, valid, and test. Guided by (Darlow et al., 2018), we combine the training and valid set to form a large training dataset containing 180, 000 images and measure the test accuracy with the test set (containing 90, 000 examples). |
| Hardware Specification | Yes | Each model is trained on a NVIDIA 2080TI GPU. |
| Software Dependencies | No | The paper mentions software components and optimizers (e.g., SGD, ResNet architectures, cosine annealing learning rate scheduler) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | B DETAILED EXPERIMENTAL SETTING CIFAR10 and CIFAR100 (Krizhevsky et al., 2009). We use Res Net18 (He et al., 2016) as the network architecture for CIFAR10/CIFAR100. For all coresets with different pruning rates, we train models for 40, 000 iterations with a 256 batch size (about 200 epochs over the entire dataset). We use the SGD optimizer (0.9 momentum and 0.0002 weight decay) with a 0.1 initial learning rate. The learning rate scheduler is the cosine annealing learning rate scheduler (Loshchilov & Hutter, 2017) with a 0.0001 minimum learning rate. We use a 4-pixel padding crop and a randomly horizontal flip as data augmentation. [...] Coverage-centric methods setting. We set the number of strata k = 50 for all datasets and pruning rates. We use the grid search with 0.1 step size to find an optimal hard cutoff rate β for different datasets and pruning rates. For each dataset, we list the optimal β value for every α as follows in the format of tuple (α, β). |