Coresets for Data-efficient Training of Machine Learning Models

Authors: Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks.
Researcher Affiliation Academia Department of Computer Science, University of California, Los Angeles, USA Department of Electrical Engineering, University of Washington, Seattle, USA Department of Computer Science, Stanford University, Stanford, USA.
Pseudocode Yes The pseudocode for CRAIG is outlined in Algorithm 1.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes We apply L2-regularized logistic regression... to classify the following two datasets from LIBSVM: (1) covtype.binary... and (2) Ijcnn1... on MNIST dataset of handwritten digits... CIFAR10.
Dataset Splits No As covtype does not come with labeled test data, we randomly split the training data into halves to make the training/test split (training and set sets are consistent for different methods).
Hardware Specification No The paper mentions 'often GPU computing' in a general context but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for ancillary software dependencies.
Experiment Setup Yes For the convex experiments, we tuned the learning rate for each method... we set λ to 10 5. ... we used a constant learning rate of 10 2. ... For optimization we used SGD with a momentum of 0.9.