reproducibilityindex.ai

Attribution Quality Metrics with Magnitude Alignment

Authors: Chase Walker, Dominic Simon, Kenny Chen, Rickard Ewetz

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose Magnitude Aligned Scoring (MAS), a new attribution quality metric that measures the alignment between the magnitude of the attributions and the model response. In the experimental evaluation, we compare the MAS metric with existing metrics across a wide range of models, datasets, attributions, and evaluations. The results demonstrate that the MAS metric is 4 more sensitive to attribution changes, 2 more consistent, and 1.6 more invariant to baseline modifications.
Researcher Affiliation	Collaboration	Chase Walker1 , Dominic Simon1 , Kenny Chen2 , Rickard Ewetz1 1Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, USA 2Lockheed Martin, Orlando, FL, USA
Pseudocode	No	The paper provides mathematical definitions and descriptions of the metric but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our code and the referenced appendix are publicly available via https://github. com/chasewalker26/Magnitude-Aligned-Scoring.
Open Datasets	Yes	We employ the Imagenet [Russakovsky et al., 2015] and RESISC45 [Cheng et al., 2017] datasets across our experiments.
Dataset Splits	No	The paper uses images from ImageNet and RESISC45 datasets (e.g., 1000 or 5000 images for evaluation), but it does not specify how these images are split into training, validation, or test sets for their experiments. The models themselves (ResNet 101, ViT-Base 16) are pre-trained, and the paper focuses on evaluating attribution metrics rather than training new models with specific splits.
Hardware Specification	Yes	The evaluations are executed on an internal cluster with NVIDIA A40 GPUs.
Software Dependencies	No	The paper states: "All evaluations are performed with Py Torch [Paszke et al., 2019]". While PyTorch is mentioned, a specific version number for PyTorch (e.g., PyTorch 1.x.x) is not provided, only a citation to the paper describing it. Thus, it does not meet the requirement of specific version numbers for software dependencies.
Experiment Setup	No	The paper describes the general evaluation process and modifications made to attributions (e.g., adding constant offset or noise), and it mentions using existing metric repositories (PIC, RISE). However, it does not provide specific hyperparameters for model training (e.g., learning rate, batch size, number of epochs) for the models (R101, VIT16) or detailed configurations for the experimental setup beyond the high-level descriptions.