Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

Authors: Yu Yang, Hao Kang, Baharan Mirzasoleiman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, Tiny Image Net, and SNLI, confirm that CREST speeds up training deep networks on very large datasets, by 1.7x to 2.5x with minimum loss in the performance.
Researcher Affiliation Academia 1Department of Computer Science, University of California Los Angeles, USA 2School of Computer Science, Georgia Institute of Technology.
Pseudocode Yes Algorithm 1 Co REsets for STochastic GD (CREST)
Open Source Code Yes 1Code can be found at https://github.com/bigml-cs-ucla/crest
Open Datasets Yes Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, Tiny Image Net, and SNLI, confirm that CREST speeds up training deep networks on very large datasets... Res Net20 on CIFAR-10 (Krizhevsky et al., 2009), Res Net18 on CIFAR-100 (Krizhevsky et al., 2009), Res Net50 on Tiny Image Net (Russakovsky et al., 2015), and Ro BERTa (Liu et al., 2019) on SNLI (Bowman et al., 2015) with 570K examples.
Dataset Splits No The paper does not explicitly provide specific percentages or sample counts for training, validation, and test splits for its experiments. It refers to a '10% budget for training' and 'Test accuracy' but lacks explicit details on validation splits.
Hardware Specification Yes We ran all experiments with a single NVIDIA RTX A6000 GPU.
Software Dependencies No The paper mentions optimizers like 'SGD optimizer' and 'AdamW optimizer' and models like 'RoBERTa' and 'ResNet' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes For all datasets except SNLI, we consider a standard deep learning training pipeline that runs for 200 epochs with a SGD optimizer with a momentum of 0.9, and decays the learning rate by a factor of 0.1 after 60% and 85% of training, and use mini-batch size 128. For fine-tuning Ro BERTa on SNLI we used an Adam W optimizer and a learning rate of 1e-5 for 8 epochs, with mini-batch size 32. We tuned the hyperparameters τ {0.1, 0.05, 0.01, 0.005, 0.001}, h {1, 2, 4, 8, 10} and used τ = 0.05, 0.01, 0.005, 0.05, h = 1, 10, 1, 4 on CIFAR-10, CIFAR-100, Tiny Imagenet, and SNLI, respectively, as listed in Table 6.