Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers
Authors: Silvan Mertes, Tobias Huber, Christina Karle, Katharina Weitz, Ruben Schlagowski, Cristina Conati, Elisabeth AndrΓ©
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. ... Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations. |
| Researcher Affiliation | Academia | 1University of Augsburg, Germany 2Fraunhofer HHI, Germany 3University of British Columbia, Canada |
| Pseudocode | No | The paper describes network architectures in text and figures, but does not include structured pseudocode or algorithm blocks in the main text. |
| Open Source Code | Yes | 1Our full implementation is open-source and available at https://github.com/hcmlab/Alterfactuals. |
| Open Datasets | Yes | To assess the performance of our approach, we applied it to the Fashion-MNIST data set [Xiao et al., 2017]. |
| Dataset Splits | No | The paper specifies a train (6,000 images per class) and test (1,000 images per class) split for the Fashion-MNIST dataset, but does not explicitly mention a validation set split in the main text. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are explicitly mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | To create the classifier to be explained, we trained a relatively simple four-layer convolutional neural network, achieving an accuracy of 96.7% after 40 training epochs. The exact architecture and training configuration can be found in the appendix. |