Using MaxSAT for Efficient Explanations of Tree Ensembles
Authors: Alexey Ignatiev, Yacine Izza, Peter J. Stuckey, Joao Marques-Silva3776-3785
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results obtained on a wide range of publicly available datasets demonstrate that the proposed Max SAT-based approach is either on par or outperforms the existing reasoning-based explainers, thus representing a robust and efficient alternative for computing formal explanations for TEs. |
| Researcher Affiliation | Academia | 1 Monash University, Melbourne, Australia 2 University of Toulouse, France 3 IRIT, CNRS, Toulouse, France |
| Pseudocode | Yes | Algorithm 1: Entailment check with Max SAT |
| Open Source Code | Yes | A prototype implementation6 of the proposed approach was developed as a Python script. It builds on (Ignatiev, Narodytska, and Marques Silva 2019c) and makes heavy use of the latest versions of the Py SMT and Py SAT toolkits (Gario and Micheli 2015; Ignatiev, Morgado, and Marques-Silva 2018). To our best knowledge, only two Max SAT solvers support formulas with real weights: LMHS (Saikko, Berg, and J arvisalo 2016) and RC2 (Ignatiev, Morgado, and Marques-Silva 2019), and only the latter is core-guided. Thus the prototype of the Max SAT-based explainer builds on Glucose 3 assisted (Audemard, Lagniez, and Simon 2013) RC2 and augments it with all the proposed heuristics. The SMT-based counterpart taken directly from (Ignatiev, Narodytska, and Marques Silva 2019c) and uses Z3 (de Moura and Bjørner 2008) as the underlying SMT engine. (Note that we additionally tested CPLEX as an oracle for the original SMT encoding. As it was significantly outperformed by the Z3-based solution, MIP is excluded from consideration below.) Both competitors support the computation of one AXp and also their enumeration (Ignatiev et al. 2020). Although both explainers support Quick Xplain-like AXp extraction (Junker 2004), it did not prove helpful in our initial results, and so the standard deletion-based Algorithm 2 was applied instead. Additionally to the aforementioned SMTand Max SAT- based AXp computation, we compared these formal rigorous explanation approaches to the computation of a heuristic explanation with the use of Anchor (Ribeiro, Singh, and Guestrin 2018). |
| Open Datasets | Yes | The experiments consider a selection of 21 publicly available datasets, which originate from UCI Machine Learning Repository (Aha et al. 2021) and Penn Machine Learning Benchmarks (Moore 2021). |
| Dataset Splits | No | No specific details on a separate validation split were provided. The paper mentions 'using 80% of the dataset samples (20% used for test data)', implying a train/test split, but no distinct validation set. |
| Hardware Specification | Yes | The experiments are performed on a Mac Book Pro with a Dual-Core Intel Core i5 2.3GHz CPU with 8GByte RAM running mac OS Big Sur. |
| Software Dependencies | Yes | Thus the prototype of the Max SAT-based explainer builds on Glucose 3 assisted (Audemard, Lagniez, and Simon 2013) RC2 and augments it with all the proposed heuristics. The SMT-based counterpart taken directly from (Ignatiev, Narodytska, and Marques Silva 2019c) and uses Z3 (de Moura and Bjørner 2008) as the underlying SMT engine. |
| Experiment Setup | Yes | All BT models are trained with XGBoost (Chen and Guestrin 2016) having 50 trees per class, each of depth at most 3 4 and using 80% of the dataset samples (20% used for test data). Also, the number of variables and constraints in the SMT (Max SAT, resp.) encoding vary from 104 to 2292 variables and 129 to 5771 assertions (4 to 10183 variables and 304 to 102611 clauses, resp.). |