What Changed? Interpretable Model Comparison

Authors: Rahul Nair, Massimiliano Mattetti, Elizabeth Daly, Dennis Wei, Oznur Alkan, Yunfeng Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation on several benchmark datasets illustrates the insights that may be obtained and shows that artificially induced changes can be reliably recovered by our method.
Researcher Affiliation Industry Rahul Nair1 , Massimiliano Mattetti1 , Elizabeth Daly1 Dennis Wei2 , Oznur Alkan1 , Yunfeng Zhang2 1IBM Research Europe 2IBM Research Yorktown
Pseudocode No The paper describes algorithmic steps in paragraph form but does not contain a formal pseudocode block or clearly labeled algorithm.
Open Source Code No The paper states using an open-source implementation for the BRCG method (footnote 1 points to github.com/Trusted-AI/AIX360) and that 'The rule comparator and BRCG+ method were implemented in Python 3.7', but it does not explicitly state that the code for their specific contributions (rule comparator and BRCG+) is open-source or publicly available.
Open Datasets No The paper states 'An empirical evaluation of the proposed methods is performed on seven binary classification datasets' and 'Full details are in the supplementary material', but does not provide specific names, links, DOIs, or formal citations for public access to these datasets within the main paper.
Dataset Splits No The paper specifies an '80% training data' and '20%... used for testing' split, but does not explicitly mention a validation split.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies Yes The rule comparator and BRCG+ method were implemented in Python 3.7 using CPLEX 12.10 as the solver for linear and integer programs.
Experiment Setup Yes For BRCG-light, the complexity of explanation is controlled by two parameters λ0 and λ1... We perturb each dataset with perturbation probabilities p {0.5, 0.6, .., 1.0}... We vary the grounding penalty cu {0.0001, 0.001, 0.01, 0.1} to see its impact.