reproducibilityindex.ai

Submodularity in Data Subset Selection and Active Learning

Authors: Kai Wei, Rishabh Iyer, Jeff Bilmes

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate the proposed framework on text categorization and handwritten digit recognition tasks with four different classiﬁers, including deep neural network (DNN) based classiﬁers. Empirical results indicate that the proposed framework yields signiﬁcant improvement over the state-of-the-art algorithms on all classiﬁers.
Researcher Affiliation	Academia	Kai Wei KAIWEI@U.WASHINGTON.EDU Rishabh Iyer RKIYER@U.WASHINGTON.EDU Jeff Bilmes BILMES@U.WASHINGTON.EDU University of Washington, Seattle, WA 98195, USA
Pseudocode	Yes	Algorithm 1 Filtered Active Submodular Selection
Open Source Code	No	The paper mentions using third-party tools like LIBLINEAR and Caffe, but does not provide any explicit statement or link for the source code of their own methodology.
Open Datasets	Yes	We evaluate text categorization on the 20 Newsgroups data set 1, which consists of 18774 articles divided almost evenly among 20 different Use Net discussion groups (Lang, 1995). ... We evaluate the handwritten digit recognition task on the MNIST database 2, which consists of 60,000 training and 10,000 test samples.
Dataset Splits	Yes	For each instance of the experiment, we randomly split 2/3 of the whole data set as the training and test samples. ... The MNIST database... consists of 60,000 training and 10,000 test samples.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'LIBLINEAR tools' and 'Caffe' but does not specify their version numbers.
Experiment Setup	Yes	For mini-batch active learning experiments, we ﬁrst randomly label B = 100 samples, on which we train a classiﬁer as the initial model. In each iteration, additional B unlabeled examples are selected for labeling to update the model. We evaluate for T = 10 iterations ending with a total of k = 1000 labeled examples. ... For FASS, we ﬁx βt = β = 4000, t and test four different submodular objectives: f NB, f NN, ffac, and ffs (c = 0.1). ... We apply a Laplace smoothing parameter of 0.02 for training all NB models in the experiments. ... A DNN model, which consists of two convolution layers followed by two fully connected layers, is trained using Caffe...