A Coverage-Based Utility Model for Identifying Unknown Unknowns
Authors: Gagan Bansal, Daniel Weld
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on four datasets show that our method outperforms bandit-based approaches and achieves within 60.9% utility of an omniscient, tractable upper bound. Experiments We evaluate our methods on the same four classification datasets used by previous work (Lakkaraju et al. 2017). |
| Researcher Affiliation | Academia | Gagan Bansal, Daniel S. Weld Paul G. Allen School of Computer Science and Engineering University of Washington Seattle, WA 98195 {bansalg, weld}@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Greedy Search |
| Open Source Code | Yes | To encourage follow-on research, all our code and data sets are available on aiweb.cs.washington.edu/ai/unkunk18. |
| Open Datasets | Yes | Pang05 (Pang and Lee 2005): This dataset contains 10k sentences from movie reviews on Rotten Tomatoes. Pang04 (Pang and Lee 2004): This dataset contains 10k sentences from IMDb plot summaries and Rotten Tomatoes movie reviews. Mc Auley15 (Mc Auley, Pandey, and Leskovec 2015): This dataset contains Amazon reviews for books and electronic items. Kaggle134 : This dataset contains 25k images of cats and dogs in total, which were randomly split into a train and test set of equal size. |
| Dataset Splits | Yes | Kaggle134 : This dataset contains 25k images of cats and dogs in total, which were randomly split into a train and test set of equal size. For all datasets, we limited the size of the test set to 5k. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or processor types used for running experiments. |
| Software Dependencies | No | The paper mentions using 'logistic regression' and 'CNN' models, and the 'kmean-both algorithm', but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For the text datasets, we used logistic regression with unigram features. For Kaggle13, we used a CNN (two convolution layers and three linear layers). To cluster the inputs, we used the kmean-both algorithm used by Lakkaraju et al. The number of clusters were selected using the elbow method. We used the following function as the similarity measure: sim(x, s) := e d(x,s)σ . We compute the new probability by smoothing between the observed frequency and the previous prior. |