reproducibilityindex.ai

Coresets for Data-efficient Training of Machine Learning Models

Authors: Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks.
Researcher Affiliation	Academia	Department of Computer Science, University of California, Los Angeles, USA Department of Electrical Engineering, University of Washington, Seattle, USA Department of Computer Science, Stanford University, Stanford, USA.
Pseudocode	Yes	The pseudocode for CRAIG is outlined in Algorithm 1.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	We apply L2-regularized logistic regression... to classify the following two datasets from LIBSVM: (1) covtype.binary... and (2) Ijcnn1... on MNIST dataset of handwritten digits... CIFAR10.
Dataset Splits	No	As covtype does not come with labeled test data, we randomly split the training data into halves to make the training/test split (training and set sets are consistent for different methods).
Hardware Specification	No	The paper mentions 'often GPU computing' in a general context but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers for ancillary software dependencies.
Experiment Setup	Yes	For the convex experiments, we tuned the learning rate for each method... we set λ to 10 5. ... we used a constant learning rate of 10 2. ... For optimization we used SGD with a momentum of 0.9.