reproducibilityindex.ai

What Changed? Interpretable Model Comparison

Authors: Rahul Nair, Massimiliano Mattetti, Elizabeth Daly, Dennis Wei, Oznur Alkan, Yunfeng Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An empirical evaluation on several benchmark datasets illustrates the insights that may be obtained and shows that artiﬁcially induced changes can be reliably recovered by our method.
Researcher Affiliation	Industry	Rahul Nair1 , Massimiliano Mattetti1 , Elizabeth Daly1 Dennis Wei2 , Oznur Alkan1 , Yunfeng Zhang2 1IBM Research Europe 2IBM Research Yorktown
Pseudocode	No	The paper describes algorithmic steps in paragraph form but does not contain a formal pseudocode block or clearly labeled algorithm.
Open Source Code	No	The paper states using an open-source implementation for the BRCG method (footnote 1 points to github.com/Trusted-AI/AIX360) and that 'The rule comparator and BRCG+ method were implemented in Python 3.7', but it does not explicitly state that the code for their specific contributions (rule comparator and BRCG+) is open-source or publicly available.
Open Datasets	No	The paper states 'An empirical evaluation of the proposed methods is performed on seven binary classiﬁcation datasets' and 'Full details are in the supplementary material', but does not provide specific names, links, DOIs, or formal citations for public access to these datasets within the main paper.
Dataset Splits	No	The paper specifies an '80% training data' and '20%... used for testing' split, but does not explicitly mention a validation split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	Yes	The rule comparator and BRCG+ method were implemented in Python 3.7 using CPLEX 12.10 as the solver for linear and integer programs.
Experiment Setup	Yes	For BRCG-light, the complexity of explanation is controlled by two parameters λ0 and λ1... We perturb each dataset with perturbation probabilities p {0.5, 0.6, .., 1.0}... We vary the grounding penalty cu {0.0001, 0.001, 0.01, 0.1} to see its impact.