Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
Authors: Yu Yang, Hao Kang, Baharan Mirzasoleiman
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, Tiny Image Net, and SNLI, confirm that CREST speeds up training deep networks on very large datasets, by 1.7x to 2.5x with minimum loss in the performance. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California Los Angeles, USA 2School of Computer Science, Georgia Institute of Technology. |
| Pseudocode | Yes | Algorithm 1 Co REsets for STochastic GD (CREST) |
| Open Source Code | Yes | 1Code can be found at https://github.com/bigml-cs-ucla/crest |
| Open Datasets | Yes | Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, Tiny Image Net, and SNLI, confirm that CREST speeds up training deep networks on very large datasets... Res Net20 on CIFAR-10 (Krizhevsky et al., 2009), Res Net18 on CIFAR-100 (Krizhevsky et al., 2009), Res Net50 on Tiny Image Net (Russakovsky et al., 2015), and Ro BERTa (Liu et al., 2019) on SNLI (Bowman et al., 2015) with 570K examples. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or sample counts for training, validation, and test splits for its experiments. It refers to a '10% budget for training' and 'Test accuracy' but lacks explicit details on validation splits. |
| Hardware Specification | Yes | We ran all experiments with a single NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions optimizers like 'SGD optimizer' and 'AdamW optimizer' and models like 'RoBERTa' and 'ResNet' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | For all datasets except SNLI, we consider a standard deep learning training pipeline that runs for 200 epochs with a SGD optimizer with a momentum of 0.9, and decays the learning rate by a factor of 0.1 after 60% and 85% of training, and use mini-batch size 128. For fine-tuning Ro BERTa on SNLI we used an Adam W optimizer and a learning rate of 1e-5 for 8 epochs, with mini-batch size 32. We tuned the hyperparameters τ {0.1, 0.05, 0.01, 0.005, 0.001}, h {1, 2, 4, 8, 10} and used τ = 0.05, 0.01, 0.005, 0.05, h = 1, 10, 1, 4 on CIFAR-10, CIFAR-100, Tiny Imagenet, and SNLI, respectively, as listed in Table 6. |