Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Anchors: High-Precision Model-Agnostic Explanations

Authors: Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

AAAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the flexibility of anchors by explaining a myriad of different models for different domains and tasks. In a user study, we show that anchors enable users to predict how a model would behave on unseen instances with less effort and higher precision, as compared to existing linear explanations or no explanations. We evaluate anchor explanations for complex models on a number of tasks, primarily focusing on how they facilitate accurate predictions by users (simulated and human) on the behavior of the models on unseen instances.
Researcher Affiliation Collaboration Marco Tulio Ribeiro University of Washington EMAIL Sameer Singh University of California, Irvine EMAIL Carlos Guestrin University of Washington EMAIL This work was supported in part by ONR award #N00014-13-1-0023, and in part by FICO and Adobe Research.
Pseudocode Yes Alg 1 presents an outline of this approach. Algorithm 1 Identifying the Best Candidate for Greedy Algorithm 2 Outline of the Beam Search
Open Source Code Yes Code and the data for all the experiments is available at https://github.com/marcotcr/anchor-experiments.
Open Datasets Yes For simulated users, we use the tabular datasets previously mentioned (adult, rcdv and lending). Code and the data for all the experiments is available at https://github.com/marcotcr/anchor-experiments.
Dataset Splits Yes Each dataset is split such that models are trained with the training set, explanations are produced for instances in the validation set, and evaluated on instances in the test set.
Hardware Specification No The paper discusses running experiments and generating explanations but does not specify any hardware details such as CPU/GPU models or memory specifications.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in its implementation or experiments.
Experiment Setup Yes We set these parameters to reasonable values, B = 10, ϵ = 0.1, δ = 0.05, and leave an analysis of the sensitivity of our approach to these for future work. For each dataset, we train three different models: logistic regression (lr), 400 gradient boosted trees (gb) and a multilayer perceptron with two layers of 50 units each (nn).