Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach
Authors: Fan Yang, Kai He, Linxiao Yang, Hongxia Du, Jingbang Yang, Bo Yang, Liang Sun
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real datasets demonstrate the effectiveness of our method. [...] Our experimental study is conducted on 20 public datasets. |
| Researcher Affiliation | Industry | Fan Yang, Kai He, Linxiao Yang, Hongxia Du, Jingbang Yang, Bo Yang, Liang Sun DAMO Academy, Alibaba Group, Hangzhou, China {fanyang.yf,kai.he,linxiao.ylx,hongxia.dhx,jingbang.yjb,muhai.yb,liang.sun} @alibaba-inc.com |
| Pseudocode | Yes | Algorithm 1 Rule set learning. [...] Algorithm 2 DS-OPT(R, u, w). [...] Algorithm 3 Local combinatorial search. |
| Open Source Code | No | The paper references third-party implementations and code (e.g., for BRS and RIPPER) but does not provide open-source code for its own described methodology. |
| Open Datasets | Yes | Our experimental study is conducted on 20 public datasets. Fifteen of them are from the UCI repository [16], and the other five are variants of the Pro Publica recidivism dataset (COMPAS) [29] and the Fair Isaac credit risk dataset (FICO) [21]. [...] [16] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive. ics.uci.edu/ml. [...] [29] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. Pro Publica, 2016. [...] [21] FICO, Google, Imperial College London, MIT, University of Oxford, UC Irvine, and UC Berkeley. Explainable machine learning challenge, 2018. URL https://community.fico.com/ s/explainable-machine-learning-challenge. |
| Dataset Splits | Yes | We estimate numerical results based on 10-fold stratified cross-validation (CV). In each CV fold, we use grid search to optimize the hyperparameters of each algorithm on the training split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions general concepts like "Bit vectors are used in our implementation to process a large number of samples efficiently." and "Scalability test." |
| Software Dependencies | No | The paper mentions software such as "scikit-learn package [40]" and |
| Experiment Setup | Yes | For the method proposed in this paper, we fix β0 = β1 = 1 and optimize the remaining hyperparameters β2 {0.5, 0.1, 0.01}, λ {0.1, 1, 4, 8, 16, 64} and K {8, 16, 32}. The hyperparameters of CG include the strength of complexity penalty and the beam width, for which we sweep in {0.001, 0.002, 0.005} and {10, 20}, respectively. For RIPPER, the proportion of training set used for pruning is varied in {0.2, 0.25, . . . , 0.6}. For BRS, the maximum length of a rule is chosen from {3, 5}. For CART and RF, we tune the minimum number of samples at leaf nodes from 1 to 100 and fix the number of trees in RF to be 100. |