Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach

Authors: Fan Yang, Kai He, Linxiao Yang, Hongxia Du, Jingbang Yang, Bo Yang, Liang Sun

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on real datasets demonstrate the effectiveness of our method. [...] Our experimental study is conducted on 20 public datasets.
Researcher Affiliation Industry Fan Yang, Kai He, Linxiao Yang, Hongxia Du, Jingbang Yang, Bo Yang, Liang Sun DAMO Academy, Alibaba Group, Hangzhou, China {fanyang.yf,kai.he,linxiao.ylx,hongxia.dhx,jingbang.yjb,muhai.yb,liang.sun} @alibaba-inc.com
Pseudocode Yes Algorithm 1 Rule set learning. [...] Algorithm 2 DS-OPT(R, u, w). [...] Algorithm 3 Local combinatorial search.
Open Source Code No The paper references third-party implementations and code (e.g., for BRS and RIPPER) but does not provide open-source code for its own described methodology.
Open Datasets Yes Our experimental study is conducted on 20 public datasets. Fifteen of them are from the UCI repository [16], and the other five are variants of the Pro Publica recidivism dataset (COMPAS) [29] and the Fair Isaac credit risk dataset (FICO) [21]. [...] [16] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive. ics.uci.edu/ml. [...] [29] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. Pro Publica, 2016. [...] [21] FICO, Google, Imperial College London, MIT, University of Oxford, UC Irvine, and UC Berkeley. Explainable machine learning challenge, 2018. URL https://community.fico.com/ s/explainable-machine-learning-challenge.
Dataset Splits Yes We estimate numerical results based on 10-fold stratified cross-validation (CV). In each CV fold, we use grid search to optimize the hyperparameters of each algorithm on the training split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions general concepts like "Bit vectors are used in our implementation to process a large number of samples efficiently." and "Scalability test."
Software Dependencies No The paper mentions software such as "scikit-learn package [40]" and
Experiment Setup Yes For the method proposed in this paper, we fix β0 = β1 = 1 and optimize the remaining hyperparameters β2 {0.5, 0.1, 0.01}, λ {0.1, 1, 4, 8, 16, 64} and K {8, 16, 32}. The hyperparameters of CG include the strength of complexity penalty and the beam width, for which we sweep in {0.001, 0.002, 0.005} and {10, 20}, respectively. For RIPPER, the proportion of training set used for pruning is varied in {0.2, 0.25, . . . , 0.6}. For BRS, the maximum length of a rule is chosen from {3, 5}. For CART and RF, we tune the minimum number of samples at leaf nodes from 1 to 100 and fix the number of trees in RF to be 100.