Towards Rigorous Interpretations: a Formalisation of Feature Attribution
Authors: Darius Afchar, Vincent Guigue, Romain Hennequin
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods and show that, even when optimised, some fail to verify the proposed properties and provide wrong solutions. 4. Experiments Armed with a formalism, we generate synthetic distributions with instance-wise ground-truth selections to evaluate attributions methods approximate selection performances and check their solution structure. All generated data, implementations and evaluations methods are available and fully reproducible at our paper code repository 3. |
| Researcher Affiliation | Collaboration | Darius Afchar 1 2 Romain Hennequin 1 Vincent Guigue 2 1Deezer Research, Paris, France 2LIP6, Paris, France. |
| Pseudocode | No | The paper describes its methods textually and does not include any labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All generated data, implementations and evaluations methods are available and fully reproducible at our paper code repository 3. Source code at github.com/deezer/functional attribution |
| Open Datasets | Yes | By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods... we generate synthetic supervised tasks and abstract models from the task by replacing them with optimal distributions or mappings... All generated data, implementations and evaluations methods are available and fully reproducible at our paper code repository 3. Source code at github.com/deezer/functional attribution |
| Dataset Splits | Yes | We generate 1000 supervised tasks with ground-truth unique univariate selections S (cj) is a singleton for all centroids; and 1000 tasks with unique multivariate selections S (cj) has a cardinality k(cj) and is chosen among n k(cj) possible subsets. We additionally generate 100 multivariate tasks to tune η for each method. |
| Hardware Specification | No | The paper reports computation times (T) in its result tables, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper discusses the use of neural networks and various methods, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We additionally generate 100 multivariate tasks to tune η for each method. The selector-predictors are the only methods for which we have to sample from p X,Y and train two neural networks, we evaluate L2X (Chen et al., 2018) with a fixed number of sampled selection dimensions, and INVASE (Yoon et al., 2019) that notably replaces this constraint with a Lagrangian penalty in its objective. |