Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Integrating Logical Analysis of Data into Random Forests
Authors: David Ing, Said Jabbour, Lakhdar Saïs
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct comparative experiments between our Random Forest using MSSes of LAD (RF-LAD) and the state-of-the-art Random Forest (RF) (from the scikit-learn Python library [Pedregosa et al., 2011]). The assessment is performed on 23 datasets, which are standard benchmarks originating from well-known repositories such as CP4IM (https: //dtai-static.cs.kuleuven.be/CP4IM/datasets/), Kaggle (www. kaggle.com), Open ML (www.openml.org), and UCI (archive. ics.uci.edu/ml/). |
| Researcher Affiliation | Academia | David Ing , Said Jabbour , Lakhdar Sa ıs CRIL, CNRS Universit e d Artois, France EMAIL |
| Pseudocode | Yes | Algorithm 1: Classical Random Forest, Algorithm 2: Random Forest Based LAD (RF-LAD) |
| Open Source Code | No | To derive such explanations from those RFs, we utilized the recent Random Forest explanation tool (RFxpl: https://github.com/izzayacine/RFxpl) proposed by Izza and Marques-Silva [Izza and Marques-Silva, 2021]. This describes a third-party tool used by the authors, not the release of their own code for RF-LAD. |
| Open Datasets | Yes | The assessment is performed on 23 datasets, which are standard benchmarks originating from well-known repositories such as CP4IM (https: //dtai-static.cs.kuleuven.be/CP4IM/datasets/), Kaggle (www. kaggle.com), Open ML (www.openml.org), and UCI (archive. ics.uci.edu/ml/). |
| Dataset Splits | Yes | For every benchmark, a Repeated Stratified 10-fold cross-validation with 3 repetitions have been achieved to maintain the class distribution (i.e. to address imbalanced datasets). |
| Hardware Specification | Yes | Experiments are conducted on a computer equipped with Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz with 62Gib of memory. |
| Software Dependencies | No | In this section, we conduct comparative experiments between our Random Forest using MSSes of LAD (RF-LAD) and the state-of-the-art Random Forest (RF) (from the scikit-learn Python library [Pedregosa et al., 2011]). To enumerate the MSSes, we use the multithreaded implementations provided by the algorithm p MMCS [Murakami and Uno, 2014], and set the number of threads to 20 to ensure diversification in terms of the generated MSSes. Specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | The parameters of RFs are kept at their default values (i.e. in a forest, K = 100 DTs), except for the maximum depth, where we set different depths d {3, 4, 5}. To control the complexity, we modified p MMCS by limiting the number of MSSes to 100K (i.e. NS = 100K) for all the considered datasets. For a fair comparison, we randomly select 100 different MSSes (without redundancy) from the 100K generated MSSes, resulting in 100 different DTs to build our RF-LAD. |