Error Discovery By Clustering Influence Embeddings
Authors: Fulton Wang, Julius Adebayo, Sarah Tan, Diego Garcia-Olano, Narine Kokhlikyan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show Inf Embed outperforms current state-of-the-art methods on 2 benchmarks, and is effective for model debugging across several case studies.2 |
| Researcher Affiliation | Collaboration | Fulton Wang Meta Julius Adebayo Prescient Design / Genentech Sarah Tan Cornell University Diego Garcia-Olano Meta Narine Kokhlikyan Meta |
| Pseudocode | Yes | Algorithm 2 Our SDM, Inf Embed, applies K-Means to influence embeddings of test examples. |
| Open Source Code | Yes | Code to replicate our findings is available at: https://github.com/adebayoj/infembed |
| Open Datasets | Yes | dcbench [Eyuboglu et al., 2022a] provides 1235 pre-trained models that are derived from real-world data... The Spot Check benchmark Plumb et al. [2022]... the test split of Imagenet [Deng et al., 2009]... AGNews [Zhang et al., 2015]... bone-age classification 4https://www.kaggle.com/datasets/kmader/rsna-bone-age |
| Dataset Splits | No | The paper mentions using 'training dataset' and 'test dataset' from standard benchmarks like Imagenet and AGNews, and the Boneage dataset. While these benchmarks typically have predefined splits, the paper does not explicitly state the exact percentages or sample counts for training, validation, and test splits within its text, nor does it specify how any custom validation sets were created. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions) required to reproduce the experiments. |
| Experiment Setup | Yes | For all experiments, we use Arnoldi dimention P = 500, and influence embedding dimension D = 100, unless noted otherwise. In the experiments that use Inf Embed-Rule, we used branching factor B=3. The rationale is that B should not be too large, to avoid unnecessarily dividing large slices with sufficient low accuracy into smaller slices. In practice, B=2 and B=3 did not give qualitatively different results. |