Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Learning of Interpretable Classification Rules
Authors: Bishwamittra Ghosh, Dmitry Malioutov, Kuldeep S. Meel
JAIR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. For instance, IMLI attains a competitive prediction accuracy and interpretability w.r.t. existing interpretable classifiers and demonstrates impressive scalability on large datasets where both interpretable and non-interpretable classifiers fail. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets. |
| Researcher Affiliation | Collaboration | Bishwamittra Ghosh EMAIL National University of Singapore Dmitry Malioutov EMAIL Kuldeep S. Meel EMAIL National University of Singapore |
| Pseudocode | Yes | Algorithm 1 Max SAT-based Mini-batch Learning... Algorithm 2 Iterative CNF Classifier Learning... Algorithm 3 Iterative learning of decision lists... Algorithm 4 Iterative learning of decision sets |
| Open Source Code | Yes | The source code is available at https://github.com/meelgroup/mlic. |
| Open Datasets | Yes | We experiment with real-world binary classification datasets from UCI (Dua & Graff, 2017), Open-ML (Vanschoren et al., 2013), and Kaggle repository (https://www.kaggle. com/datasets), as listed in Table 1. |
| Dataset Splits | Yes | We perform ten-fold cross-validation on each dataset and evaluate the performance of different classifiers based on the median prediction accuracy on the test data. |
| Hardware Specification | Yes | We conduct each experiment on an Intel Xeon E7 8857 v2 CPU using a single core with 16 GB of RAM running on a 64bit Linux distribution based on Debian. For all classifiers, we set the training timeout to 1000 seconds. |
| Software Dependencies | No | To implement IMLI, we deploy a state-of-the-art Max SAT solver Open-WBO (Martins et al., 2014), which returns the current best solution upon reaching a timeout. We compare IMLI with state-of-the-art interpretable and non-interpretable classifiers. Among interpretable classifiers, we compare with RIPPER (Cohen, 1995), BRL (Letham et al., 2015), CORELS (Angelino et al., 2017), and BRS (Wang et al., 2017). Among non-interpretable classifiers, we compare with Random Forest (RF), Support Vector Machine with linear kernels (SVM), Logistic Regression classifier (LR), and k-Nearest Neighbors classifier (k NN). We deploy the Scikit-learn library in Python for implementing noninterpretable classifiers. |
| Experiment Setup | Yes | For IMLI, we vary the number of clauses k {1, 2, . . . , 5} and the regularization parameter λ in a logarithmic grid by choosing 5 values between 10 4 and 101. In the mini-batch learning in IMLI, we set the number of samples in each mini-batch, n {50, 100, 200, 400}. |