Very Fast, Approximate Counterfactual Explanations for Decision Forests

Authors: Miguel Á. Carreira-Perpinan, Suryabhan Singh Hada

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a simple but very effective approach: we constrain the optimization to only those input space regions defined by the forest that are populated by actual data points. ... In sections 4 5 we give our algorithm (LIRE) and evaluate it in section 6. ... Fig. 5 shows that the approximation (in terms of the distance to the source instance) is quite good... Table 1 compares LIRE with searching on the dataset instances, Feature Tweak (Tolomei et al. 2017) and OCEAN (Parmentier and Vidal 2021). We use several classification datasets of different size, dimensionality and type, and axis-aligned forests (Random Forests) of different size.
Researcher Affiliation Academia Miguel A. Carreira-Perpi n an, Suryabhan Singh Hada, Dept. Computer Science & Engineering, University of California, Merced {mcarreira-perpinan, shada}@ucmerced.edu
Pseudocode Yes Figure 1: Pseudocode for finding all nonempty regions (top) and all live regions (bottom)... Figure 4: Pseudocode for the search for the closest live region to a source instance x for axis-aligned (top) and oblique trees (bottom).
Open Source Code No The paper does not provide an explicit statement about releasing code for the methodology or a link to a code repository.
Open Datasets Yes Fig. 2 shows the results for Random Forests (Breiman 2001) on several small datasets... Table 1: Comparison of different CE algorithms... for different datasets and Random Forests... Datasets: breast cancer (559,9,2), climate (432,18,2), spambase (3.6k,57,2), yeast (1162,8,10), letter (16k,16,26), MNIST (55k,784,10), Mini Boo NE (104051,50,2), Swarm (18647,2400,2).
Dataset Splits No The paper mentions 'training, validation and test datasets used to train the forest' but does not provide specific percentages, sample counts, or citations to predefined splits for these sets to enable reproduction of data partitioning.
Hardware Specification No The paper states 'All runtimes were obtained in a single core (without parallel processing)' and mentions that for axis-aligned trees it takes less than 1 second 'on a laptop,' but it does not specify any exact hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions tools like 'Random Forests', 'CART', 'TAO', and 'MATLAB' (in a footnote) but does not provide specific version numbers for any software dependencies required to replicate the experiment.
Experiment Setup No The paper describes the type of forests used ('Random Forests', 'oblique trees trained with TAO'), mentions 'individual trees grown in full, i.e., not pruned', and provides average tree depth and number of leaves. However, it does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings in the main text. It refers to 'Carreira-Perpi n an and Hada (2023)' for more details.