reproducibilityindex.ai

A Coverage-Based Utility Model for Identifying Unknown Unknowns

Authors: Gagan Bansal, Daniel Weld

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on four datasets show that our method outperforms bandit-based approaches and achieves within 60.9% utility of an omniscient, tractable upper bound. Experiments We evaluate our methods on the same four classiﬁcation datasets used by previous work (Lakkaraju et al. 2017).
Researcher Affiliation	Academia	Gagan Bansal, Daniel S. Weld Paul G. Allen School of Computer Science and Engineering University of Washington Seattle, WA 98195 {bansalg, weld}@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Greedy Search
Open Source Code	Yes	To encourage follow-on research, all our code and data sets are available on aiweb.cs.washington.edu/ai/unkunk18.
Open Datasets	Yes	Pang05 (Pang and Lee 2005): This dataset contains 10k sentences from movie reviews on Rotten Tomatoes. Pang04 (Pang and Lee 2004): This dataset contains 10k sentences from IMDb plot summaries and Rotten Tomatoes movie reviews. Mc Auley15 (Mc Auley, Pandey, and Leskovec 2015): This dataset contains Amazon reviews for books and electronic items. Kaggle134 : This dataset contains 25k images of cats and dogs in total, which were randomly split into a train and test set of equal size.
Dataset Splits	Yes	Kaggle134 : This dataset contains 25k images of cats and dogs in total, which were randomly split into a train and test set of equal size. For all datasets, we limited the size of the test set to 5k.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory, or processor types used for running experiments.
Software Dependencies	No	The paper mentions using 'logistic regression' and 'CNN' models, and the 'kmean-both algorithm', but does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For the text datasets, we used logistic regression with unigram features. For Kaggle13, we used a CNN (two convolution layers and three linear layers). To cluster the inputs, we used the kmean-both algorithm used by Lakkaraju et al. The number of clusters were selected using the elbow method. We used the following function as the similarity measure: sim(x, s) := e d(x,s)σ . We compute the new probability by smoothing between the observed frequency and the previous prior.