Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Hybrid Interpretable Models: Theory, Taxonomy, and Methods

Authors: Julien Ferry, Gabriel Laberge, Ulrich Aïvodji

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally show empirically that Hybrid CORELS is competitive with existing approaches and performs just as well as a standalone black-box (or even better) while being partly transparent.
Researcher Affiliation Academia Julien Ferry EMAIL Operations Research, Combinatorial Optimization and Constraints LAAS-CNRS, Université de Toulouse, CNRS Toulouse, France Gabriel Laberge EMAIL Génie Informatique et Génie Logiciel Polytechnique Montréal Montréal, Canada Ulrich Aïvodji EMAIL Software and Information Technology Engineering École de Technologie Supérieure Montréal, Canada
Pseudocode Yes The CORELS pseudo-code is presented as Algorithm 1 in Appendix B.2. We provide the Hybrid CORELSPost pseudo-code as Algorithm 2 in the Appendix B.3. We provide the Hybrid CORELSPre pseudo-code as Algorithm 3 in the Appendix B.3.
Open Source Code Yes Our algorithms Hybrid CORELSPost and Hybrid CORELSPre (as well as its Hybrid CORELSPre,No Collab variant discussed in the Appendix C) are integrated into a user-friendly Python module, publicly available on Py PI2 and Git Hub3. 2https://pypi.org/project/Hybrid CORELS 3https://github.com/ferryjul/Hybrid CORELS
Open Datasets Yes The COMPAS dataset1(analyzed by Angwin et al. (2016)) contains 6,150 records from criminal offenders in the Broward County of Florida collected from 2013 and 2014. 1https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv The UCI Adult Income dataset (Dua & Graff, 2017) stores demographic attributes of 48,842 individuals from the 1994 U.S. census. The ACS Employment dataset (Ding et al., 2021) is an extension of the UCI Adult Income dataset that includes more recent Census data (2014-2018).
Dataset Splits Yes For the three datasets presented in Section 5.1, experiments are run for five different train/test splits, with 80% of the data used for training and the remaining 20% for testing. For these experiments, each dataset was split into training (60%), validation (20%), and test (20%) sets. We randomly generate five such splits and average the results over them.
Hardware Specification Yes All experiments are run on a computing grid over a set of homogeneous nodes using Intel Platinum 8260 Cascade Lake @2.4Ghz CPU.
Software Dependencies No The paper mentions Scikit-learn, Hyperopt, and CORELS without specifying version numbers for these software components. For example: In all experiments we used the following Scikit-learn (Pedregosa et al., 2011) classifiers as black-boxes... Our algorithms... build upon the original CORELS (Angelino et al., 2017) C++ implementation4 and its Python wrapper5.
Experiment Setup Yes For the prefix building part, we optimize the hyperparameters of Hybrid CORELSPre using grid search over the following values: λ {10 2, 10 3, 10 4}, minsupport {0.01, 0.05, 0.10}, and the objective-guided, lower-bound-guided, and BFS search policies. The Scikit-learn (Pedregosa et al., 2011) black-boxes are chosen to be either an Ada Boost Classifier with default parameters, a Gradient Boosting Classifier with default parameters and a Random Forest Classifier with min_samples_split = 10 and max_depth = 10. The black-box hyperparameters are tuned using the Hyperopt (Bergstra et al., 2013) Python library and its Tree of Parzen Estimators (TPE) algorithm, with 100 iterations.