Prospector Heads: Generalized Feature Attribution for Large Models & Data

Authors: Gautam Machiraju, Alexander Derry, Arjun D Desai, Neel Guha, Amir-Hossein Karimi, James Zou, Russ B Altman, Christopher Re, Parag Mallick

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate prospectors using three primary tasks, each representing a different data modality (sequences, images, and graphs). Prospectors outperform baseline attribution methods in region localization and generalize across data modalities. In all tasks, prospectors achieve higher AUPRC and AP than baseline methods, often with large improvements (Figure 6).
Researcher Affiliation Collaboration 1Department of Biomedical Data Science, Stanford University 2Cartesia AI 3Department of Computer Science, Stanford University 4Department of Electrical & Computer Engineering, University of Waterloo 5Department of Radiology, Stanford University.
Pseudocode Yes Algorithm 1 rollup
Open Source Code Yes Our code is made available at: https://github.com/gmachiraju/K2.
Open Datasets Yes We use the Wiki Section (Arnold et al., 2019) benchmark dataset, We evaluate prospectors on Camelyon16 (Ehteshami Bejnordi et al., 2017), Metal PDB (Putignano et al., 2018), a curated dataset derived from the Protein Data Bank (PDB) (Berman et al., 2002).
Dataset Splits Yes Due to the MIA, the best models were selected based on their ability to localize ground truth class1 regions in the training set, since these were not seen by prospectors during training. To select a top prospector configuration after the training grid-search, we first compute four token-level evaluation metrics for training set localization and apply sequential ranking over those chosen metrics.
Hardware Specification Yes trained for 20 epochs on a single NVIDIA T4 GPU. trained for 30 epochs on a single NVIDIA T4 GPU.
Software Dependencies No The paper mentions software packages like 'sklearn python package', 'shap Python package', 'transformers package', 'Captum', and 'Pytorch Geometric', but does not provide specific version numbers for these dependencies, which are necessary for reproducible descriptions.
Experiment Setup Yes For each task, we conduct a grid-search of hyperparameter configurations to select an optimal prospector model. The prospector kernel has two main hyperparameters the number of concepts k and the skip-gram neighborhood radius r... We describe all tested hyperparameters in our training grid search in Table S3. The MLP was implemented with the sklearn package and trained with one hidden layer (dimension 100), Re LU activations, adam optimizer, L2-regularization term of 1e-4, initial learning rate 1e-3, and minibatch size of 200.