reproducibilityindex.ai

Very Fast, Approximate Counterfactual Explanations for Decision Forests

Authors: Miguel Á. Carreira-Perpinan, Suryabhan Singh Hada

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a simple but very effective approach: we constrain the optimization to only those input space regions deﬁned by the forest that are populated by actual data points. ... In sections 4 5 we give our algorithm (LIRE) and evaluate it in section 6. ... Fig. 5 shows that the approximation (in terms of the distance to the source instance) is quite good... Table 1 compares LIRE with searching on the dataset instances, Feature Tweak (Tolomei et al. 2017) and OCEAN (Parmentier and Vidal 2021). We use several classiﬁcation datasets of different size, dimensionality and type, and axis-aligned forests (Random Forests) of different size.
Researcher Affiliation	Academia	Miguel A. Carreira-Perpi n an, Suryabhan Singh Hada, Dept. Computer Science & Engineering, University of California, Merced {mcarreira-perpinan, shada}@ucmerced.edu
Pseudocode	Yes	Figure 1: Pseudocode for ﬁnding all nonempty regions (top) and all live regions (bottom)... Figure 4: Pseudocode for the search for the closest live region to a source instance x for axis-aligned (top) and oblique trees (bottom).
Open Source Code	No	The paper does not provide an explicit statement about releasing code for the methodology or a link to a code repository.
Open Datasets	Yes	Fig. 2 shows the results for Random Forests (Breiman 2001) on several small datasets... Table 1: Comparison of different CE algorithms... for different datasets and Random Forests... Datasets: breast cancer (559,9,2), climate (432,18,2), spambase (3.6k,57,2), yeast (1162,8,10), letter (16k,16,26), MNIST (55k,784,10), Mini Boo NE (104051,50,2), Swarm (18647,2400,2).
Dataset Splits	No	The paper mentions 'training, validation and test datasets used to train the forest' but does not provide specific percentages, sample counts, or citations to predefined splits for these sets to enable reproduction of data partitioning.
Hardware Specification	No	The paper states 'All runtimes were obtained in a single core (without parallel processing)' and mentions that for axis-aligned trees it takes less than 1 second 'on a laptop,' but it does not specify any exact hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper mentions tools like 'Random Forests', 'CART', 'TAO', and 'MATLAB' (in a footnote) but does not provide specific version numbers for any software dependencies required to replicate the experiment.
Experiment Setup	No	The paper describes the type of forests used ('Random Forests', 'oblique trees trained with TAO'), mentions 'individual trees grown in full, i.e., not pruned', and provides average tree depth and number of leaves. However, it does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings in the main text. It refers to 'Carreira-Perpi n an and Hada (2023)' for more details.