Very Fast, Approximate Counterfactual Explanations for Decision Forests
Authors: Miguel Á. Carreira-Perpinan, Suryabhan Singh Hada
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a simple but very effective approach: we constrain the optimization to only those input space regions defined by the forest that are populated by actual data points. ... In sections 4 5 we give our algorithm (LIRE) and evaluate it in section 6. ... Fig. 5 shows that the approximation (in terms of the distance to the source instance) is quite good... Table 1 compares LIRE with searching on the dataset instances, Feature Tweak (Tolomei et al. 2017) and OCEAN (Parmentier and Vidal 2021). We use several classification datasets of different size, dimensionality and type, and axis-aligned forests (Random Forests) of different size. |
| Researcher Affiliation | Academia | Miguel A. Carreira-Perpi n an, Suryabhan Singh Hada, Dept. Computer Science & Engineering, University of California, Merced {mcarreira-perpinan, shada}@ucmerced.edu |
| Pseudocode | Yes | Figure 1: Pseudocode for finding all nonempty regions (top) and all live regions (bottom)... Figure 4: Pseudocode for the search for the closest live region to a source instance x for axis-aligned (top) and oblique trees (bottom). |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code for the methodology or a link to a code repository. |
| Open Datasets | Yes | Fig. 2 shows the results for Random Forests (Breiman 2001) on several small datasets... Table 1: Comparison of different CE algorithms... for different datasets and Random Forests... Datasets: breast cancer (559,9,2), climate (432,18,2), spambase (3.6k,57,2), yeast (1162,8,10), letter (16k,16,26), MNIST (55k,784,10), Mini Boo NE (104051,50,2), Swarm (18647,2400,2). |
| Dataset Splits | No | The paper mentions 'training, validation and test datasets used to train the forest' but does not provide specific percentages, sample counts, or citations to predefined splits for these sets to enable reproduction of data partitioning. |
| Hardware Specification | No | The paper states 'All runtimes were obtained in a single core (without parallel processing)' and mentions that for axis-aligned trees it takes less than 1 second 'on a laptop,' but it does not specify any exact hardware details such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions tools like 'Random Forests', 'CART', 'TAO', and 'MATLAB' (in a footnote) but does not provide specific version numbers for any software dependencies required to replicate the experiment. |
| Experiment Setup | No | The paper describes the type of forests used ('Random Forests', 'oblique trees trained with TAO'), mentions 'individual trees grown in full, i.e., not pruned', and provides average tree depth and number of leaves. However, it does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings in the main text. It refers to 'Carreira-Perpi n an and Hada (2023)' for more details. |