reproducibilityindex.ai

Towards Rigorous Interpretations: a Formalisation of Feature Attribution

Authors: Darius Afchar, Vincent Guigue, Romain Hennequin

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods and show that, even when optimised, some fail to verify the proposed properties and provide wrong solutions. 4. Experiments Armed with a formalism, we generate synthetic distributions with instance-wise ground-truth selections to evaluate attributions methods approximate selection performances and check their solution structure. All generated data, implementations and evaluations methods are available and fully reproducible at our paper code repository 3.
Researcher Affiliation	Collaboration	Darius Afchar 1 2 Romain Hennequin 1 Vincent Guigue 2 1Deezer Research, Paris, France 2LIP6, Paris, France.
Pseudocode	No	The paper describes its methods textually and does not include any labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All generated data, implementations and evaluations methods are available and fully reproducible at our paper code repository 3. Source code at github.com/deezer/functional attribution
Open Datasets	Yes	By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods... we generate synthetic supervised tasks and abstract models from the task by replacing them with optimal distributions or mappings... All generated data, implementations and evaluations methods are available and fully reproducible at our paper code repository 3. Source code at github.com/deezer/functional attribution
Dataset Splits	Yes	We generate 1000 supervised tasks with ground-truth unique univariate selections S (cj) is a singleton for all centroids; and 1000 tasks with unique multivariate selections S (cj) has a cardinality k(cj) and is chosen among n k(cj) possible subsets. We additionally generate 100 multivariate tasks to tune η for each method.
Hardware Specification	No	The paper reports computation times (T) in its result tables, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper discusses the use of neural networks and various methods, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We additionally generate 100 multivariate tasks to tune η for each method. The selector-predictors are the only methods for which we have to sample from p X,Y and train two neural networks, we evaluate L2X (Chen et al., 2018) with a ﬁxed number of sampled selection dimensions, and INVASE (Yoon et al., 2019) that notably replaces this constraint with a Lagrangian penalty in its objective.