A Coverage-Based Utility Model for Identifying Unknown Unknowns

Authors: Gagan Bansal, Daniel Weld

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on four datasets show that our method outperforms bandit-based approaches and achieves within 60.9% utility of an omniscient, tractable upper bound. Experiments We evaluate our methods on the same four classification datasets used by previous work (Lakkaraju et al. 2017).
Researcher Affiliation Academia Gagan Bansal, Daniel S. Weld Paul G. Allen School of Computer Science and Engineering University of Washington Seattle, WA 98195 {bansalg, weld}@cs.washington.edu
Pseudocode Yes Algorithm 1 Greedy Search
Open Source Code Yes To encourage follow-on research, all our code and data sets are available on aiweb.cs.washington.edu/ai/unkunk18.
Open Datasets Yes Pang05 (Pang and Lee 2005): This dataset contains 10k sentences from movie reviews on Rotten Tomatoes. Pang04 (Pang and Lee 2004): This dataset contains 10k sentences from IMDb plot summaries and Rotten Tomatoes movie reviews. Mc Auley15 (Mc Auley, Pandey, and Leskovec 2015): This dataset contains Amazon reviews for books and electronic items. Kaggle134 : This dataset contains 25k images of cats and dogs in total, which were randomly split into a train and test set of equal size.
Dataset Splits Yes Kaggle134 : This dataset contains 25k images of cats and dogs in total, which were randomly split into a train and test set of equal size. For all datasets, we limited the size of the test set to 5k.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory, or processor types used for running experiments.
Software Dependencies No The paper mentions using 'logistic regression' and 'CNN' models, and the 'kmean-both algorithm', but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes For the text datasets, we used logistic regression with unigram features. For Kaggle13, we used a CNN (two convolution layers and three linear layers). To cluster the inputs, we used the kmean-both algorithm used by Lakkaraju et al. The number of clusters were selected using the elbow method. We used the following function as the similarity measure: sim(x, s) := e d(x,s)σ . We compute the new probability by smoothing between the observed frequency and the previous prior.