reproducibilityindex.ai

Error Discovery By Clustering Influence Embeddings

Authors: Fulton Wang, Julius Adebayo, Sarah Tan, Diego Garcia-Olano, Narine Kokhlikyan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show Inf Embed outperforms current state-of-the-art methods on 2 benchmarks, and is effective for model debugging across several case studies.2
Researcher Affiliation	Collaboration	Fulton Wang Meta Julius Adebayo Prescient Design / Genentech Sarah Tan Cornell University Diego Garcia-Olano Meta Narine Kokhlikyan Meta
Pseudocode	Yes	Algorithm 2 Our SDM, Inf Embed, applies K-Means to influence embeddings of test examples.
Open Source Code	Yes	Code to replicate our findings is available at: https://github.com/adebayoj/infembed
Open Datasets	Yes	dcbench [Eyuboglu et al., 2022a] provides 1235 pre-trained models that are derived from real-world data... The Spot Check benchmark Plumb et al. [2022]... the test split of Imagenet [Deng et al., 2009]... AGNews [Zhang et al., 2015]... bone-age classification 4https://www.kaggle.com/datasets/kmader/rsna-bone-age
Dataset Splits	No	The paper mentions using 'training dataset' and 'test dataset' from standard benchmarks like Imagenet and AGNews, and the Boneage dataset. While these benchmarks typically have predefined splits, the paper does not explicitly state the exact percentages or sample counts for training, validation, and test splits within its text, nor does it specify how any custom validation sets were created.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions) required to reproduce the experiments.
Experiment Setup	Yes	For all experiments, we use Arnoldi dimention P = 500, and influence embedding dimension D = 100, unless noted otherwise. In the experiments that use Inf Embed-Rule, we used branching factor B=3. The rationale is that B should not be too large, to avoid unnecessarily dividing large slices with sufficient low accuracy into smaller slices. In practice, B=2 and B=3 did not give qualitatively different results.