Coresets for Data-efficient Training of Machine Learning Models
Authors: Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks. |
| Researcher Affiliation | Academia | Department of Computer Science, University of California, Los Angeles, USA Department of Electrical Engineering, University of Washington, Seattle, USA Department of Computer Science, Stanford University, Stanford, USA. |
| Pseudocode | Yes | The pseudocode for CRAIG is outlined in Algorithm 1. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | We apply L2-regularized logistic regression... to classify the following two datasets from LIBSVM: (1) covtype.binary... and (2) Ijcnn1... on MNIST dataset of handwritten digits... CIFAR10. |
| Dataset Splits | No | As covtype does not come with labeled test data, we randomly split the training data into halves to make the training/test split (training and set sets are consistent for different methods). |
| Hardware Specification | No | The paper mentions 'often GPU computing' in a general context but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for ancillary software dependencies. |
| Experiment Setup | Yes | For the convex experiments, we tuned the learning rate for each method... we set λ to 10 5. ... we used a constant learning rate of 10 2. ... For optimization we used SGD with a momentum of 0.9. |