$\mathbb{D}^2$ Pruning: Message Passing for Balancing Diversity & Difficulty in Data Pruning
Authors: Adyasha Maharana, Prateek Yadav, Mohit Bansal
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate supervised and self-supervised versions of our method on various vision and NLP datasets. Results show that D2 PRUNING improves coreset selection over previous state-of-the-art methods at low-to-medium pruning rates. |
| Researcher Affiliation | Academia | Adyasha Maharana, Prateek Yadav & Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {adyasha,praty,mbansal}@cs.unc.edu |
| Pseudocode | Yes | Algorithm 1 D2 PRUNING for Data Selection is presented on page 9. |
| Open Source Code | Yes | Our code is available at https://github.com/adymaharana/d2pruning |
| Open Datasets | Yes | We use the CIFAR10, CIFAR100 (Krizhevsky et al., 2009) and Image Net-1K (Deng et al., 2009) image classification datasets for our experiments on vision benchmarks. We use the Adversarial NLI dataset (Nie et al., 2020) for natural language inference. We use the Im DB reviews dataset (Maas et al., 2011) for the sentiment analysis task. |
| Dataset Splits | Yes | The Image Net-1K dataset... It contains 1,281,167 and 50,000 images in training and validation splits respectively. The Adversarial NLI dataset... which contains 100459, 1200, and 1200 examples in the training, development, and test splits respectively. We created an in-house version of the Im DB Reviews dataset that contains 2000, and 1000 samples in the training and development splits respectively... |
| Hardware Specification | Yes | Graph initialization involves getting the k-nearest neighbors which are computed on a A100 GPU using Py Torch... Results are computed using a multi-thread implementation of D2 PRUNING using 8 workers on a CPU with 32 cores. Additionally, we provide the approximate training times for each dataset computed on a single A100 GPU. |
| Software Dependencies | No | The paper mentions using "Py Torch" but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | For D2 PRUNING, we set the forward message passing weight γf to 1.0 and perform a sweep over k = {1, 5, 10, 15} and γr = {0, 0.1, 0.2...1.0} for CIFAR10, CIFAR100 datasets... For fine-tuning of pretrained Ro BERTa on NLP datasets, we perform a grid search over learning rates {1e 5, 2e 5, 5e 5, 1e 4} and batch sizes {8, 16, 32}... Ro BERTa models are trained for 10000 and 1500 training steps for Adversarial NLI and Im DB (2k) datasets respectively, with early stopping. |