Explaining Random Forests Using Bipolar Argumentation and Markov Networks

Authors: Nico Potyka, Xiang Yin, Francesca Toni

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As the computational complexity of the problems is high, we consider a probabilistic algorithm to approximate reasons and present first experimental results. We tested our algorithm on three datasets. The Iris and PIMA dataset are continuous datasets that have been considered for counterfactual explanations (White and d Avila Garcez 2020). In addition, we consider the Mushroom dataset that contains discrete features.
Researcher Affiliation Academia Department of Computing, Imperial College London, London, UK {n.potyka, x.yin20, f.toni}@imperial.ac.uk
Pseudocode Yes Figure 2: Probabilistic approximation algorithm for estimating the percentage of non-ambiguous inputs, and the probabilities of sufficient and necessary queries.
Open Source Code Yes 1https://github.com/nicopotyka/Uncertainpy, folder examples/explanations/random Forests.
Open Datasets Yes We tested our algorithm on three datasets. The Iris and PIMA dataset are continuous datasets that have been considered for counterfactual explanations (White and d Avila Garcez 2020). In addition, we consider the Mushroom dataset that contains discrete features. For reproducibility, the datasets are contained in the source folder.
Dataset Splits No The paper mentions using datasets for testing but does not explicitly specify training, validation, and test splits with percentages or sample counts, nor does it refer to predefined splits with citations.
Hardware Specification Yes We generated 10,000 samples for the first stage in less than one minute on a Windows laptop with i711800H CPU and 16 GB RAM.
Software Dependencies No The paper states it was implemented in Python but does not provide specific version numbers for Python or any other libraries or software dependencies.
Experiment Setup Yes We chose δ = 0.9. Our implementation works in two stages. The first stage is analogous to Figure 2 and the queries are the atomic sufficient and necessary queries of the form (Uy | Ui) and (Ui | Uy) for all combinations of feature arguments Ui and class arguments Uy. [...] For every pair (ui, uj), the probability can be estimated quickly. However, since there can be a large number of pairs, the overall runtime can be long and the almost sufficient reasons of size 2 are reported continuously while the sampling procedure is running. We generated 10,000 samples for the first stage in less than one minute.